{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "de539e5a",
   "metadata": {},
   "source": [
    "# Report 02 - Titanic\n",
    "    姓名：刘晓冉\n",
    "    学号：2021300706\n",
    "\n",
    "### 任务简介 \n",
    "   由 Kaggle 主办的泰坦尼克号挑战赛，其目标是根据一组描述某位乘客的变量（如年龄、性别或船上乘客等级），预测哪些乘客在泰坦尼克号沉船中能够幸存下来。本质上是分类问题，需要利用训练集中已知信息和存活情况训练分类模型，预测出预测集中的存活情况。在这次比赛中，有两个数据集，均为csv文件，文件中包括乘客信息，如姓名、年龄、性别、社会经济等级等。一个数据集名为train.csv，另一个是test.csv。train.csv文件包含列车上891名乘客乘客的详细信息，同时还包括他们是否还活着的信息。test.csv的数据集包含类似的信息，但不披露每个乘客的是否存活的信息。主要工作便是预测这些结果，使用在train.csv数据中找到的模型，预测车上的其他418名乘客(test.csv文件)是否幸存。\n",
    "\n",
    "\n",
    "### 解决途径\n",
    "    解决路线：导入数据包与数据集，探索性数据分析，特征工程，特征选择，模型训练以及参数优化\n",
    "    选用方法：逻辑回归"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f6d69c91",
   "metadata": {},
   "source": [
    "### 导入数据包与数据集"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 291,
   "id": "e513589b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>PassengerId</th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Cabin</th>\n",
       "      <th>Embarked</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Braund, Mr. Owen Harris</td>\n",
       "      <td>male</td>\n",
       "      <td>22.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>A/5 21171</td>\n",
       "      <td>7.2500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Cumings, Mrs. John Bradley (Florence Briggs Th...</td>\n",
       "      <td>female</td>\n",
       "      <td>38.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>PC 17599</td>\n",
       "      <td>71.2833</td>\n",
       "      <td>C85</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>Heikkinen, Miss. Laina</td>\n",
       "      <td>female</td>\n",
       "      <td>26.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>STON/O2. 3101282</td>\n",
       "      <td>7.9250</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Futrelle, Mrs. Jacques Heath (Lily May Peel)</td>\n",
       "      <td>female</td>\n",
       "      <td>35.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>113803</td>\n",
       "      <td>53.1000</td>\n",
       "      <td>C123</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Allen, Mr. William Henry</td>\n",
       "      <td>male</td>\n",
       "      <td>35.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>373450</td>\n",
       "      <td>8.0500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>886</th>\n",
       "      <td>887</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>Montvila, Rev. Juozas</td>\n",
       "      <td>male</td>\n",
       "      <td>27.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>211536</td>\n",
       "      <td>13.0000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>887</th>\n",
       "      <td>888</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Graham, Miss. Margaret Edith</td>\n",
       "      <td>female</td>\n",
       "      <td>19.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>112053</td>\n",
       "      <td>30.0000</td>\n",
       "      <td>B42</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>888</th>\n",
       "      <td>889</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Johnston, Miss. Catherine Helen \"Carrie\"</td>\n",
       "      <td>female</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>W./C. 6607</td>\n",
       "      <td>23.4500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>889</th>\n",
       "      <td>890</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Behr, Mr. Karl Howell</td>\n",
       "      <td>male</td>\n",
       "      <td>26.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>111369</td>\n",
       "      <td>30.0000</td>\n",
       "      <td>C148</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>890</th>\n",
       "      <td>891</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Dooley, Mr. Patrick</td>\n",
       "      <td>male</td>\n",
       "      <td>32.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>370376</td>\n",
       "      <td>7.7500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Q</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>891 rows × 12 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "     PassengerId  Survived  Pclass  \\\n",
       "0              1         0       3   \n",
       "1              2         1       1   \n",
       "2              3         1       3   \n",
       "3              4         1       1   \n",
       "4              5         0       3   \n",
       "..           ...       ...     ...   \n",
       "886          887         0       2   \n",
       "887          888         1       1   \n",
       "888          889         0       3   \n",
       "889          890         1       1   \n",
       "890          891         0       3   \n",
       "\n",
       "                                                  Name     Sex   Age  SibSp  \\\n",
       "0                              Braund, Mr. Owen Harris    male  22.0      1   \n",
       "1    Cumings, Mrs. John Bradley (Florence Briggs Th...  female  38.0      1   \n",
       "2                               Heikkinen, Miss. Laina  female  26.0      0   \n",
       "3         Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1   \n",
       "4                             Allen, Mr. William Henry    male  35.0      0   \n",
       "..                                                 ...     ...   ...    ...   \n",
       "886                              Montvila, Rev. Juozas    male  27.0      0   \n",
       "887                       Graham, Miss. Margaret Edith  female  19.0      0   \n",
       "888           Johnston, Miss. Catherine Helen \"Carrie\"  female   NaN      1   \n",
       "889                              Behr, Mr. Karl Howell    male  26.0      0   \n",
       "890                                Dooley, Mr. Patrick    male  32.0      0   \n",
       "\n",
       "     Parch            Ticket     Fare Cabin Embarked  \n",
       "0        0         A/5 21171   7.2500   NaN        S  \n",
       "1        0          PC 17599  71.2833   C85        C  \n",
       "2        0  STON/O2. 3101282   7.9250   NaN        S  \n",
       "3        0            113803  53.1000  C123        S  \n",
       "4        0            373450   8.0500   NaN        S  \n",
       "..     ...               ...      ...   ...      ...  \n",
       "886      0            211536  13.0000   NaN        S  \n",
       "887      0            112053  30.0000   B42        S  \n",
       "888      2        W./C. 6607  23.4500   NaN        S  \n",
       "889      0            111369  30.0000  C148        C  \n",
       "890      0            370376   7.7500   NaN        Q  \n",
       "\n",
       "[891 rows x 12 columns]"
      ]
     },
     "execution_count": 291,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#导入数据处理包\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import seaborn as sns   # 数据可视化\n",
    "import missingno as msno\n",
    "from sklearn.preprocessing import OneHotEncoder\n",
    "from matplotlib import pyplot as plt\n",
    "%matplotlib inline\n",
    "#导入数据\n",
    "train = pd.read_csv(\"data/train.csv\")\n",
    "test = pd.read_csv(\"data/test.csv\")\n",
    "gender_submission = pd.read_csv(\"data/gender_submission.csv\")\n",
    "train"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 292,
   "id": "9f0d4a95",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>PassengerId</th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Cabin</th>\n",
       "      <th>Embarked</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>0.0</td>\n",
       "      <td>3</td>\n",
       "      <td>Braund, Mr. Owen Harris</td>\n",
       "      <td>male</td>\n",
       "      <td>22.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>A/5 21171</td>\n",
       "      <td>7.2500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1</td>\n",
       "      <td>Cumings, Mrs. John Bradley (Florence Briggs Th...</td>\n",
       "      <td>female</td>\n",
       "      <td>38.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>PC 17599</td>\n",
       "      <td>71.2833</td>\n",
       "      <td>C85</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>1.0</td>\n",
       "      <td>3</td>\n",
       "      <td>Heikkinen, Miss. Laina</td>\n",
       "      <td>female</td>\n",
       "      <td>26.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>STON/O2. 3101282</td>\n",
       "      <td>7.9250</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1</td>\n",
       "      <td>Futrelle, Mrs. Jacques Heath (Lily May Peel)</td>\n",
       "      <td>female</td>\n",
       "      <td>35.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>113803</td>\n",
       "      <td>53.1000</td>\n",
       "      <td>C123</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>0.0</td>\n",
       "      <td>3</td>\n",
       "      <td>Allen, Mr. William Henry</td>\n",
       "      <td>male</td>\n",
       "      <td>35.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>373450</td>\n",
       "      <td>8.0500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1304</th>\n",
       "      <td>1305</td>\n",
       "      <td>NaN</td>\n",
       "      <td>3</td>\n",
       "      <td>Spector, Mr. Woolf</td>\n",
       "      <td>male</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>A.5. 3236</td>\n",
       "      <td>8.0500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1305</th>\n",
       "      <td>1306</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1</td>\n",
       "      <td>Oliva y Ocana, Dona. Fermina</td>\n",
       "      <td>female</td>\n",
       "      <td>39.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>PC 17758</td>\n",
       "      <td>108.9000</td>\n",
       "      <td>C105</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1306</th>\n",
       "      <td>1307</td>\n",
       "      <td>NaN</td>\n",
       "      <td>3</td>\n",
       "      <td>Saether, Mr. Simon Sivertsen</td>\n",
       "      <td>male</td>\n",
       "      <td>38.5</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>SOTON/O.Q. 3101262</td>\n",
       "      <td>7.2500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1307</th>\n",
       "      <td>1308</td>\n",
       "      <td>NaN</td>\n",
       "      <td>3</td>\n",
       "      <td>Ware, Mr. Frederick</td>\n",
       "      <td>male</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>359309</td>\n",
       "      <td>8.0500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1308</th>\n",
       "      <td>1309</td>\n",
       "      <td>NaN</td>\n",
       "      <td>3</td>\n",
       "      <td>Peter, Master. Michael J</td>\n",
       "      <td>male</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>2668</td>\n",
       "      <td>22.3583</td>\n",
       "      <td>NaN</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>1309 rows × 12 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "      PassengerId  Survived  Pclass  \\\n",
       "0               1       0.0       3   \n",
       "1               2       1.0       1   \n",
       "2               3       1.0       3   \n",
       "3               4       1.0       1   \n",
       "4               5       0.0       3   \n",
       "...           ...       ...     ...   \n",
       "1304         1305       NaN       3   \n",
       "1305         1306       NaN       1   \n",
       "1306         1307       NaN       3   \n",
       "1307         1308       NaN       3   \n",
       "1308         1309       NaN       3   \n",
       "\n",
       "                                                   Name     Sex   Age  SibSp  \\\n",
       "0                               Braund, Mr. Owen Harris    male  22.0      1   \n",
       "1     Cumings, Mrs. John Bradley (Florence Briggs Th...  female  38.0      1   \n",
       "2                                Heikkinen, Miss. Laina  female  26.0      0   \n",
       "3          Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1   \n",
       "4                              Allen, Mr. William Henry    male  35.0      0   \n",
       "...                                                 ...     ...   ...    ...   \n",
       "1304                                 Spector, Mr. Woolf    male   NaN      0   \n",
       "1305                       Oliva y Ocana, Dona. Fermina  female  39.0      0   \n",
       "1306                       Saether, Mr. Simon Sivertsen    male  38.5      0   \n",
       "1307                                Ware, Mr. Frederick    male   NaN      0   \n",
       "1308                           Peter, Master. Michael J    male   NaN      1   \n",
       "\n",
       "      Parch              Ticket      Fare Cabin Embarked  \n",
       "0         0           A/5 21171    7.2500   NaN        S  \n",
       "1         0            PC 17599   71.2833   C85        C  \n",
       "2         0    STON/O2. 3101282    7.9250   NaN        S  \n",
       "3         0              113803   53.1000  C123        S  \n",
       "4         0              373450    8.0500   NaN        S  \n",
       "...     ...                 ...       ...   ...      ...  \n",
       "1304      0           A.5. 3236    8.0500   NaN        S  \n",
       "1305      0            PC 17758  108.9000  C105        C  \n",
       "1306      0  SOTON/O.Q. 3101262    7.2500   NaN        S  \n",
       "1307      0              359309    8.0500   NaN        S  \n",
       "1308      1                2668   22.3583   NaN        C  \n",
       "\n",
       "[1309 rows x 12 columns]"
      ]
     },
     "execution_count": 292,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 合并数据集，方便清洗\n",
    "data = pd.concat([train, test], ignore_index = True)\n",
    "pred_id=test['PassengerId']\n",
    "data"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0ec0148e",
   "metadata": {},
   "source": [
    "### EDA"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 293,
   "id": "e43598ae",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 891 entries, 0 to 890\n",
      "Data columns (total 12 columns):\n",
      " #   Column       Non-Null Count  Dtype  \n",
      "---  ------       --------------  -----  \n",
      " 0   PassengerId  891 non-null    int64  \n",
      " 1   Survived     891 non-null    int64  \n",
      " 2   Pclass       891 non-null    int64  \n",
      " 3   Name         891 non-null    object \n",
      " 4   Sex          891 non-null    object \n",
      " 5   Age          714 non-null    float64\n",
      " 6   SibSp        891 non-null    int64  \n",
      " 7   Parch        891 non-null    int64  \n",
      " 8   Ticket       891 non-null    object \n",
      " 9   Fare         891 non-null    float64\n",
      " 10  Cabin        204 non-null    object \n",
      " 11  Embarked     889 non-null    object \n",
      "dtypes: float64(2), int64(5), object(5)\n",
      "memory usage: 83.7+ KB\n"
     ]
    }
   ],
   "source": [
    "# 显示训练数据的总体信息\n",
    "train.info()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 294,
   "id": "371d4bd3",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "PassengerId      0\n",
       "Survived         0\n",
       "Pclass           0\n",
       "Name             0\n",
       "Sex              0\n",
       "Age            177\n",
       "SibSp            0\n",
       "Parch            0\n",
       "Ticket           0\n",
       "Fare             0\n",
       "Cabin          687\n",
       "Embarked         2\n",
       "dtype: int64"
      ]
     },
     "execution_count": 294,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "train.isnull().sum()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 295,
   "id": "65b7f50b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "PassengerId      0\n",
       "Pclass           0\n",
       "Name             0\n",
       "Sex              0\n",
       "Age             86\n",
       "SibSp            0\n",
       "Parch            0\n",
       "Ticket           0\n",
       "Fare             1\n",
       "Cabin          327\n",
       "Embarked         0\n",
       "dtype: int64"
      ]
     },
     "execution_count": 295,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "test.isnull().sum()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 296,
   "id": "d7edbd65",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<AxesSubplot:>"
      ]
     },
     "execution_count": 296,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAuoAAAETCAYAAABgEiPqAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/Il7ecAAAACXBIWXMAAAsTAAALEwEAmpwYAABN2klEQVR4nO3dd5wV1f3/8dfZgkiRXqQooIiICBEV21clNowasaVp7JoYv/lpjFETC/aoSfxao9FgiRJ7LzF2g9hiC7YYFWwQbBQLZZfd8/vjc8advdxtd+/emTu8n4/HPNidewc+w9y585lzPueM894jIiIiIiLpUpF0ACIiIiIisjIl6iIiIiIiKaREXUREREQkhZSoi4iIiIikkBJ1EREREZEUUqIuIiIiIpJCStRFRERERFJIibqIiIiISAopURcRERERSSEl6tIuzrkxzrndk45DREREJGuqkg5Aypdzbg3gamBN55zz3t+TdEwiIiIiWaEWdSmY9/4L4FzgE+Bs59xeCYckIiIikhnOe590DFKGnHMOqPDe1znnvgOcgfXQTPXe351sdCIiIiLlTy3qUqjKkKR3AZYCnwFrAb9zzu2abGgiIiIi5U8t6tJmoR7dO+e6AS8B7wFfAQuBg4C3gF+rZV1ERESkcBpMKm0WkvQK4HJgBfAT4IPQwv4AcDbwW+fcCu/9/UnGKiIiIlKulKhLi5xz/YEa7/2i2OouwEjgBe/9nFCzjvf+9vDzLcAFzrlKzQYjIiIi0naqUZdmOeeGY6UsP3HO9Yy9tBRYgtWlR63sleHn24DrgZ7AFc65bUsZs4iIiEgWKFGXlnwEzABOAg6IJev1wCvAeOfcXqHlvM6ZaqBXeP0B4KmSRy0iIiKZFUpwM2+V2EkpnPe+FtgHuA84HzjQOdfb2yjkk4GPgd8CU8L7PTAc6A6c470/LCTwlUnELyIiItnhnFsdwHtfvyrkFpr1RZrknKv23teGmvMq4Fpgb+B44Ebv/afOuXWA+4E1gdeAD4FNgS+BCVEru9cHTURERNrBOVcFPAas6b0fGdZVeu/rko2s46hFXfJyzlWEJL0XcBXQDzgEuAM4D/ihc66P9/5d4FvAFcDXwCDsJNokaklXki4iIiJFUAncCfRyzj0NkPVee7Woy0pCkl4fas3/DvQB9vTezw7rrgP2onHLugMc0Nl7vyT8PVXe+xUJ7YaIiIhkjHOuE3AwcBrwnvd+i7A+ky3ralGXRkKZSpSkbwMsBw7HHmoU1awfROOW9d7e1MeSdKckXURERIohlL3gva/BSm1vASY65x4K6zPZsq4WdWkkVo/+T2xmlzrv/abhtW/uVsMd7TXAHsA5wMXe+6+SiVpERESyKj7WzTl3CzYubnWgLzZN9NPe+63D65lqWVeLujQSWsZrsQGi44EJzrmJ4bW62IONarCupxnAjlh9uoiIiEhRxZL0PwD/A5wIfBsYDUwFxmW1Zl2JukSt6NHP1QDe+5OAY8PqXzjn1gvrfU6yvjuwfXx9CeNePYpLREREsivkJ5sBj2O9/l9675cCF2D16ptnsQxGifoqLgz49M65yvCh7hm95r2/EDgV+B5wgnNuZFgfT9ZXhJr2ilLO7hL+/VuAx5xzG5Xq3xUREZFEdMKe07IsNBRGZS5fA5djD1ncwTn3GliynlSgxaREfRUWPuArnHPdgenYHeprzrkrnXPbA3jvz8KS9YOBE+PJevzv8t7XlzL28O//ASu5udY5N66U/76IiIh0jHw99CEhvwPY1Tn3PyEPiBoKlwBvAk8CFc654aWNuOMoUV9FhYEZdc65bliCPhh4GLgIK2f5s3PuJ/BNsn4ylqyf55wbklDYjXjvn8BiWg2Y5pwbX+ryGxERESme+PNXnHMVYfKKyAPYAxV/45zbPJpxzjnXD1gD+Asw0Xs/p/SRd4yqpAOQZMTKV84AvgAOBd4JH/habOrFxdGc6t77c5xzawCTgHnJRb7S/OyrY70BZ2F1asdi3V8iIiJSRnJml5uKPem8p3PuTeBU7/2DzrnzscbD65xzfwTqsOmkNweO8t5/mVD4HULTM67inHOPAR967w8Mv++PPdDoZO/9b51zXYH+0d1pNEVSlMAnEG/uFE2jsDneBwNjsblVD/He/6vUsYmIiEj7OeduAyYCjwDVwNaAB4713t/pnNsbOACrAFgIzAV+nMVrv1rUVyE5Sa7Duol6YnVdOOf2w7qNfuO9P9c5txpwJvAf59w0731t1BKfRJIOjaZoOh3YFtgHeBb7LE/GWtWvcc4dDMwq5QBXERERaR/n3M+BbwE/wuZHr3POTcbKXkaFhsLbnXN3YfOpO+Ar7/3CxILuQErUVxFRC3jUrRQS2MXOuVeB74ZR0pdgXUvnhs1GY3e0c8Pc6sDKA0kTMgH4F/BciK3WOXcvNrj0ZuBi4Gjn3L9SEq+IiIi0bGPgHayxrS5MYnEDcBNwUdRQGEpkPkouzNLQYNJVQCxJrwaed84dEXv5QmApcBlwdhg4Spjy8I9AbXhPKoSBJdXY08jqvfc1zrmq0Mq/Ahvx/Rj2QIS/AmMSDFdERESaEJ/r3DnXJfT2jwYWee8XO+fWB57DSmAO994vdc6dGnrVVwlK1DMuDLyMkvRNga7Ahc6574W3vAGcC/wHOMw5N9U5dy1wDTZn6Y5JPjgg52FM0cDWWuA2YCfn3LYhQa8C8N4vx2rW7wcWYzchIiIikhKxZ7FEA0enAVuFHvBHsfnQp2BPP4+S9K+dc2thDXC9nHOdEwm+xDSYNMOiMpcwT/odWB1XD6xspA774F8bPuyjgV8A6wALsCkbzwnzrMdnWSl5/OHnaqC7935B+H1trMV8HWBv7/3MsL4fcBXWqn55vGRHREREkuOcq8J6w+tjOcqmwN+B0d77j51z22EPMFoXeMx7v3PYdgDwW2x82s7e+3cS2YkSU6KecWFA6AzgK+AkrAtpF+BwYDcsWb8m9v6u4aEC0e/fJMullJOk/w7YAkvKnwP+BDyITRV5LnbjcSF28zEKGx0+cVU5iUVERNIuNAreDTwPnBa7xkezu4z23n8U1h0FHANUYpNaDAXGY0n6t7M4u0tTNJg0+zYARgJHAs+GbqX7nXNzsBb2q5xzS733N4X3NyoVSeoRvLET+BZsQOttWHK+DzZv+kXe+9PDQ5n2A/YHaoAPgUlK0kVERFKlMzAEa1D70jn3h3Ct74LlHkucc5289zXe+8uccx8DU7BW9PnAy8DW3vs3kwk/GUrUs68PVu7yfphacTXv/XLv/RvOuT9jc5D+xTlX772/JdlQG3POHQBshj199KkwcPRubLaX6lCz/jLwcngAwhKslyhTDzsQEREpZ+F6vcg5ty1wO9ZaXuGc+z324MJlUWlrxHt/G3Cbc66P9/7zKH8pefAJU6KeUbE505/BniT6c+AZ7/3y2If9IewOdTlwg3Nujvf+n8lFvZKxwGfACyFJ3wAbZHIbNkNNvXNuBDZ49FNNwygiIpJK0eDRz5xzuwN3YXnJ19i4OB9KYD7Hyl08NuvcMGBOWF9T8qhTQLO+ZEQ0K0t8lpSYa4DtnXNngs2MEt63OVa7fj5WMvJD51xlE39Hh8qdVSYMOOkL1Hnvv3DOjQKewgaJHhymaDoKOAzorCRdREQkfULDYVTO+lusHHdv4F3gl8BRWA36lcArwOvYhBYvYg1ztZCaZ7iUnFrUMyA2crorMNU51xf40Dn3Z+/9h865a7DpjH4SHhxwBbAWdnLM897f5ZybCgxOQU368cAfwmwzLwIHhiemXoi1ph8WpmhaE9gGUJmLiEgBkposQFYd0bTK4edpwLeBv3nvFzrn9sJmpNsYu77/Hmtd74OVstZgOcrcRIJPCc36khEhSX8BqAeqgZ5Ya/ku3vu3QonIQWEZgs0x/go2A0w34HGsK2oqduNa8g+Gc24nbMDogd7768NUi7diCfnDsSmahgKnAzsC23vv/1PqWPOJfyGJiKSRc64C6OS9XxZbN9J7/3aCYUkGxUpwcc71Ac4AHvHe3xl7Tx8sWV8TuMJ7f0EiwaaYSl/KWCgPifwE+ADYA2s9/3/YHenTzrn1vfezgfOA9bA72s3Dnx74HTAAuC48UCipu7cXsa6w3QG8958Cf8CeNrqtc+5059yfsB6B3YDdUpSkV8ZaDSYkHY+ISBOmAFc453oBOOeewMYodUkyKMmeWJL+B+Df2KxtH0avh+vm51gZzHzgaOfcGUk9YDGtlKiXsVAe0jWUrQzAWp3fCQ/5uRk4Afvwz3TOree9/zq0ojzpvX8LS3b/CnwH2CnJKQ1jJ+xpwHedc3sAeO/vxUp0LgZ2AjYE3sSmaErFPKo5c75fDvzVOXdQslG1XWhpi352uetEpLyF8/kLYFfgb865vwNrA0d775ckGpxkUriWfAl8jE3PGK2vip567r3/DNgTWIQl7T0TCDW1VPpS5pxz+2JJOcCx3vsLcxLHXbGHAvUHtovPP+qc2wd7kNBV3vt/lzDmb550Gu8aC7+PAm7BynJ+ktM92917/2Wa6ipzuvZuBb4FnAr8s5y6knM+M72wp8B+EHu90XESkfIUkvWdgXuxGb/28d7/LdmoVm0537+Z+a6NjZ+rxHr5T8US9q3D7C+VsWS9zjnXG7v2vJ9o4Cmj1rLy9yhW9vIJsKdzrlf4wFcAeO/vx1rWK7AyEmKv3QacVIokPbT87xb+3ShJvww40Tm3cfS+0NJ/G/BDrKUnPiPMV+HP1NSBx5L0X2MDYn4E3Oq9fzvs81rOuV5pbpnOuUhcgo1XeNU595Jz7lDn3ICsXDik9NL82V/VhCSwHmu4+QR7yMxpYXC+JCDn+/fnwEXOuV855zZKOLQ2yy1ZifYr/HkxVqPeHbjbOdc3J0mv9N4vUJK+MrWol5GmWpLDLC/7YIn4A9hgzCU5o623BJ5L4q49XKgfB2YDh4eSnR5YT8B4bPahK4EHvPdPhYGxz2MPNjrYl8EDDsLMOt2A74f53ScAFwCDsRuMs733tyYZY0ucczdhPSyXY/PXT6ahPOoX3vvFCYYnZSjqPXPOrYaNi6kDFnrvX084tDZJUy9eIXIHujvnBmNlCOOAS7G64T299/Ni78lMy245CD2yO2AtzusBs4Dfee+nJxpYK+XccBwKrIMNEP0/4G1vUypXYQ86OgYbU/fdeMt6MpGnnxL1MhG74K2ODRgdhNVz3RrKQboD+2PJ+v3kSdbD35PICeGcGw18FGLdxHv/Qli/HbAtcCz2QIPngDOBXwOjscT33VLH25x8/6fA37C6uguwi9/RwAxsFpsjsLrQbdJ60+Gcm4J9dn4GPOa9rw2tbHOxefanpjV2Saco0QvfTY9hPWQ9gWXYxfsK7/1/EwyxVZxznaMSPOfcT7Ek6i3g2bSMk2lOTgI1EOuRrPH2lMhO2M34ZViyvrv3/uOwfipwh/f+xaRiz7Kc47IJ1lh1FFb2uQ42E8oK4Hzv/bUJhdkqOY2CN2K9yx9jPfnrAScBd4akPErWj8J6dP4njE+TpnjvtaR8ASrCn92xu+w3gYVYq+d7wOTY60diLbg3AV2Tjj3E5WI/nwv8F/hRznvWwwaSvheWGdgF5ZSk48+JszL2867AqFj8H2JzwP4TG5wVve9Y4DWgV9LxN7NfJ2BPf+sffh8NfIpNj7l6WDcGqE46Vi3pX6LzBOstm4nN3LQn1sjwB6w2+kZgYNKxNhF/F+AGYEhs3c3h/J4dvpv+CeyddKytOQ7h50uwKXznYzNw/DBcM1w4NnPD64cA14Z9HJP0PmR9Cd+9FwLTsYf3Res3CNeNN4CDko6zlfvyx3Ad2TL8/pvwOVqEPYW0b1hfBZyCPdhoWNJxp31JPAAtrTxQsFq42D2K3a32AUaFL9d3og870AOrWa8HzkxB3JU5v08K8b4I/CDntQrs0cFnYC3RXwJjk96HfPuCPe31ReC3QLewrltI2NeKva8v1jJyR5TwpmmhoVftHOCd8PN6ISG5KbZvh2ItPn2SjllLupfYZ6pz+J66Adgi9noV8OOQrE9NOt4m9uFHWMv/M8BAYBPgVaz3rwp7/sTLIdH4ftLxtmJ/bsIaEo4DjgemhWvEeUDXcH3ZDWsEWgD8BxiXdNxZWoA1sFbkeDK+PfZMk3nA/4V1lUBV+HmD8BmbBfw06X1oYf+2CefLruH347Eniu6PTRDxRdj/frH97J103OWwJB6AllYeKKvvnBMuENFJ/CPsyV0n5ry3D9ZCUpV03LGYfkfDzcRELFl/JX6RI9ZaGy6OqUwKsZbAaL73NcO6yjzv2xC4GmuZ3iDpuENMFU2snxwu3Cdjg8xuJfTIYFN/3hD2u3vS+6AlfQv2kLWq2O9VwD/CZ2oOMDzP+y/EegbXTzr+PPtTgbUAvg88HZKOy7AHBUXv2QFrgX4jzck6Nn7pHey5GVFPx5hwbC6IEkesZb1nSLhS2dNRzgtWzvlk/DMU1v8sJOqLgM1jxyK6zo8Orz8L9Eh6P5rZv3Wxks8uWG/NF8D+4bUJWGnrHOxmUQl6W/5vkw5ASysPlN2VLqfhbnS/8EV7Yvi9B3A2sbv1sD7xZD0k5jXAr2IXis3In6wnHm8L+3Iw8BGwdWxfeoSkfFzsfaeEfXuTlLRM0bhHYBAwNPb76iEZXxEuCFG51TDsZmNeGhMqLckvWM/eycDhhBs5rIX2F+EcWIxNx5b7Gdw3fC9slvQ+5OxPdF47rJb2bayc8Pxo32LvjZL1fwEHJB17FHfO7ydiDQtDwu/rYq3mN9JQ1rZWKWNcFRdCz0X4eVegZ+y1w7HBlU8Dm0THkYZkfRQwIul9iMXr8qyrItxIAHdjDyaMbgKrw3nyKVa7ntoy0DQumjYrhXKnOArewbqKNgrTHF4PnOy9Pzc8UGBbrCVkXHwjH6ZCTJL3/jngKezCHK17HusR6Ab8OswHn4p4WzACmOu9fwqocs5thdXgPgi87Jw7Jbzvcay7bxefggFnYWBfNHBpGtZF+bJz7lrnXD/v/VLgImxqzM2Aa51zd2DlLrtg+1GyufalPDjntsDm494b6wH7EsDbwOM/YTWrS4BLw3MQ4gPZK7AW9erSRt20+AA/bxnGxVjCsQDYzzk30Hu/3DlXHd7zCNba3g34WRg4m5gwqM+Hn/uH1YMBvPcfOefWxmbUegibgWupc+5wYk8qlY7h7YGDy8O17l7s87JGeO0qbPzWAODiMOGCB+rDZ/Itb08XT5RzriLEE33GujnnVosmu/DeL3b2hNsRWNIePQdlfayndnOsnHVhMntQppK+U9DSeKHhDnp1LJGN6j2HYXfb72JTnB0T22YUNvhyOk2UNpQw/tzWnGh/tsJqzo+Ovw9LCt/Eupj3TPr/vxX79zPCIFdswNUyrGXqh9jsKPXAeuG9iR6LJuL/HdZCeAo2uGxR+OxEg2IHYq0794blJGDdpOPWkr4lnLuLsGR8XGx9Rezn1bFZjz7HWte3xcZAbIMljDPScp7QuLV/P2Db8LMD/hdrCfwnDeVu8VK9bckp70l4X24Bjg8/7wR8jc2082n4vlojvDYYuA6rYV8j6bibOh5ZWoBe4VpRgw22XCP22s+wwcpPEspg0rCQZxIB4PdYg9S/gKuAjcP6LsCd2IDlg8O1/ypsjEf/pPelHJfEA9ASOxgNXa7dsRbot7Cnc0av/yBc8N7FatC7A1OwUoWXaEiKE7nw0Xh2l+qc1wZgA2EfxlqfKmL7u1WIP01de03Vcg/GalU/Bu7CWqWi1/Yn1sWchiV3P7Dylu9FnzestXwe1sI+Ova+1UoVY3uOh5bEjkd/bCrVq2ii3jT2fdQNS9bnYSUk72E9go9F3xNJJ2U0vrmYjtWdT4/2DUvWj8YaFJ4n1HCTU2+cQNyrEbtBCN+rfcL/cVRuNDgkTsuAl2PvHYqVtX1EysraaHzTNAkbfLwDoUGhXJZmriO9sUaTOlZO1n+K3QA/GI7vSmUmJd6HziGWc2LrbsNayK8P18EPsKkW9w2vD8fyl2VYr9kHpKQEtByXxAPQknNArI7tdSyhnQh0yXn9+yGp+iqczG9iLZ+JXvByvlivBf6Mzckbf8/3sBbn74TfK2iohe5cijgL2Jf/weoJc/elP7GBPeHieDU2gK5n0vuQZz9GACPDF+sGsfUVwI5YEjUTm2UgfsOV6EUiNwZgUFriKnBfmrpwl9WNCPAtbGBYNDVsdNPdB7v5OxebEWmzsL5rSEBew6Zn7Rn7u1IzLiWcw+8D2xEGs9O4Zv1oGmqJByUca3X4vnmYkGiHGAdgDTpb5hyvh8L624C/AI+EYzE+6f/3nP2K3zTdRMNAyyXYbDSpGAvQ0j7QeHD15mGZEFvXHWuVzpesH0pKejKx5x/8E7v5+zUwBGt025qGnvGtsLr0GmxedMLncN+waAxEe45B0gFoiR0M+5K9CGupGhI7CUZi0ziNCb8PwLqd9wLG0pDsJnLBy/li3Q4bNDobG0R2OzZ3cpfw+qNYS1q/3H1P+v8/xBFPbq/GWtY+xsp2/gKsnWebb2PTnS0ENkx6H/LEdx2WVNVgg0UPzj1+WLL+PnaTmMpWK6xe/u6k42hH/FEL82rYE2B3ADZNOq4C92UK1oL27di68Vgjwgrshrwem55tz/B6N2zq2IXhwt8prE9FiUNIpN7HythyS/jiyfr/wxpKHsF6pRL77sJKKD7D5ngfHdb1xWbc2BIb4BddR9bDbpYexR6KdwYpSQab2LfLwvfWd8I+bUzDTEKpPG+wso+1c9bdgN1s1IVr4l8IvSDhnIha1k8kJY08sdjjn50HsWT9NqyBcGDOeydgZTBPoMGixT0OSQegJeeAWGJ7fbggdMYeYDQ3XNy+Dr+v1PqWb10Csd8d4uyEDR45AGsBmYuV52yD3Yj8h5TN9JBnX/6CtZxNDheJG8MF4i5iD2jAWj6ew+ZUTsWc7zS+2bgoXOxOCRfmFSHWnXM/P+GC+CYpfABFSDjuBO4Kv6fixq6txwRrRXsWGyewArsBnAYMTjrGNu7PKCxRvw9rZT4LS14/wab8648NMH0Za7VdO2zXGRsDMR8bIJ9Y6QgrP+Nh35AwRUmUa+LPivA9PDLB2OONI6dgLc63YNMu9g//v6kpJSxg/wZjc4f/PxoaeYaE68tfSMnD/HKPCfa8jPg4pUux5PZgrFz1JOzG6hlCow5Ws/7bsN2xaftui33uRwF/D+fzq7HX42M1zgr7p+k9i3kMkg5gVV5iF+941/5tIbH6PdbqsTwkWBOxO9r/pOVLisYJ4XZYa8eOOSduN6wV50mslSdqEbk26fhz/+9j6w7CWgaiLrzjsFq7C7AWkbuiiyDWyvM9UlSXHtuPXsCpWMIU9brsGb5oZ7Jysu7IKbVKMPZ8N6MnYS2e3UlJK2wb92l1bCzG4+E82RI4MJwPN5CywXzN7Ed04Z6ClSNErefXE0phYu/9TTh3Noqt64zNUT6bhAZg0jjRPRir19497Mfmsdfi382/zD1nkjwGObGdhiXrN2P13J9hN0S7Yz2au2AlfP9L6AXJ992X4P5EvU3RZ2tcOBbRd/BobOadW2hI3H9CynowsZu9V7Eb1hHYA6UOjH3/VmJlIvOB22Pb9QZOJzZOKE1LLP5RwN/Csfl97PUol/lp2PfEbmKzuCQewKq6xL6YuoQLwA7h965YLeErWCtuvM7wFCzRTdVDZ4Cp2GPB/07jJD23xeqQcDF/nxQMLAn/15cR5q0N6zph4wB+HX4/Ems5/F44VmfTkJSkZpaHPPsWtdB8ELvYxROsKFnfMelY88QeT0Dic71/H2uBjmasqMi3TVoXbLDxW9jNXXThOyAcp+Oa+j9Iy5IvJqwsbztgYs766PvtjJC4rBU/Zliy3jPp/cAaRt7CynY2C0nGdHJqarFyw7uxBpTqJI8PjRtIBsd+PhWrQX8mfKaeD99dS8P5/imW7KYqico5HmdiZWHDwrE4AksOF2A3IdE8/VuG4zE5iZhb2IfdsRlPvgj/57vmef+hWO/Nd/P9HUkvNNMQEs75B7GS0LNi6/tijVj/SurczuqSeACr4kLjbvAXsdbm42n81LsuNDwZshJ7SMWzwFVJx5+zL33CCVsPPB9bn3dAIvYY5VS0HIYLQj3wALEBVcBa4cLcD+u6/+bYhIv5grDdnaRoIFzOvn0vfK6WA3uFdfHP1x5YEv8GsH3S8TaxD38Nn63nsZa034f/90Ox3oJUtP63YX/OxFqRo4ee/DDszwnh997APknHmSfu+INXXCz+KOmOn9/xm6eRWA/C1U19HySwL/H4otlQdoitOycckz8RegGwcUDTsDrjRJNcGifp54b/2ymxdadgJUjPYmV73cP32SBs6tVU1Q7nHI+LsV6BTcJ14lmsbGQBcGvs89cbuAYb0Ltm0vsQiz/+Gf9u+P6tB36a59iNwkpZf5p03C0ckxOxaXzPDedLNGnFKCxZr6dhzMNN2I3iRknEneUl8QBW1QVLxF/BWqHH05AI5k6n1xvrtnwaS+obdREmEPdKd9pY68dT4aQ9gDx1p0lenFvYnx2xlo+/A9/KeW1MuHAcFFu3F3APsHPSF+0m9if+JbsH1rrxOQ3zpMd7PPbFWn6GJR13vs9Z+CwdQcMMFfPCZ+xTrFb1baxF9AZSUpIQjz/PuuOBxeHnqMwierJwBXYD8hApmSGBnBtqrIztL+FcfwLrSYumKaykIXHvhM0IMTN8xyU6bWwT+/Z7LJF6h5zxAeG1L7CE9x3s5upDEp4dhcaJ4K0hrqNYefDiyeEcuSWN31FN7M84LPneJ/Z5WQ/rfa3BynkqsLKRa0jvwP34TC9TsBvVBeQkr9jN3zxi0y+nbcES78+wmZqWhGvJzjQ8bXQk1si1ACvJnZSW766sLYkHsKou2KOp38DuTKOShG9hLaE/xrpXK7GSknexKRijL7DE63OxO+1+sd+HYYN/5gC7kecBCWldwpfPV4Sbptj6CeGCfX7Yv4HYtJO3ER69nfSS+1kg54YIa9l5I1y48yXr3ZLehxBHi0kcsFE4F67B6j7PC8dsFimaBzp2Pq9O49K1HbBejKil7Rex18aE9X/MPYYJ7cNGWJld1BuzWrgYv4H1dMzEShNepfE83hthrWvPYC1tUQtcanqegB7YOIH/hn2Kvlc7x96ze/iOuworfxuWdNyx2E7B5j7fiiZKDcN7PsVaPVM5i1Ms1ouxJPADGqaZjD43Y7GbpbnhO/oN7OYvFa22rfz+fR1rLNkLq1sfH77DPk3Z5yp+47RhOP83xgYnjwv/9++GcyNK1tcL5/rLNPE8BS1FODZJB7CqLthsHC+Fn9fE6tS/CF9I3wy2xOYwnULCUzDmxD4ZK6l4In5yYsns61hLT2qT9ZwLWvT/Ohmrf85N1k8Jx+M9bEaUBaRzdpdjQ1LxD2zwaLx2NZ6sR7MRRBfCNCSF8f2YgM0+swkNpV/xVqongD/Hfq8m4YfO5NsXrPXvpvC5iZdWXBk+T09jNZ1RK+FzYUm0xywW52SsFe1pwnMEsKRvrdh7DsYS3Vk0tKzvGLb5dez/IvHvrFjM0fm+Jg0zOV0Vez1Nn6V8A6qrsB69G8l5KFnuZwa7kX2PlM8ohPU0LQrHYvfY+ujz0wtrrT0C2BQYkHTM8fjCzy19//4Lu2bOw6bLnUUKxmnl25fw+xZYAt4jtm4YDdf33GRdLekdeXySDmBVWWhoZYv+3IeG+q6nsdaCn2MtCD8Pr22Y83ck3pIe4qjCZkKZg7UC9om9Fp3Mb2EtCKlK1nO+XPfDHq/dI/y+Ew3J+sax9x2CJVj/R0pap2hc4nIzVgZyS7iAf43NzTsy9p7oYlFPiuZOztmP6eEiUIe11r5MTks5Nn3pU0nH3cS+xAeIb4212n4SLso7xt73ZyyBmofd/L2O3YCk4imdsTgnYw0HT2DJ4a1YjXD8HDoEuwH8TWxd/MEtqXniaJ7XBoR9mg+cl3scE467G9aLOiZnfU+s5fmyZrbdKvZz36T3pZk445+jg7BE9jliA5PTcCxaEXtrvn+nYOV79VgrdSpmbguxxb+DT8LK26YB03OPA42v73uTsidYZ3VJPICsL01dqLBBmIdjrVRn0XjmkSPCxT3xgTK5X5Q01NJXYQ82irry4y3ra4ck5GVSUloR4spNCt/BZkroG1u/M3mS9TQs5J914yKsO3Ji+P1n4WJQh5VRxC8W+2ADtNZLel/y7EeUvO6LdbtOCb9/Fj5PUUvoCeGimKqBpDTcgHfD6v7/hs1KcX04Hq8Qnsgb3rc91gp3PHYTlXjrM3Zz8cec82Q3LFlfCkyLrY/3cjwPPNKaz2uJ9yeeTO2K9QAcSeMu/kFYKds84PzY+sTq6bEa/3VDXN1yXuuOtXQ+R55WTKx35jpyEvw0LLRw04ZN7bcAK53aLLY+8V6/ZmJuy/fvD8L3byrHDWAlbV9gDyOLplz9We7xI6XX9ywviQeQ5YWGu9DVsTlfz8ASw/hFroqGlrRqrBvpH4TWq4Tjj1+wp8R+jifrx4UL+aPEZhTAZhpI5QM3sNaCD0KytFJdHQ3JeqMLRsIxd8W6VreLrZsAzKChjvh47EmQO9MwPeMlxHoB0vLFmpMsjcZKKH5MqP3HHpi1GLuhitcO/zxczHskvQ959ikqd5mFtTxFF7YDsJuLV2hmOsyWEpkOjj06ly/Oc3x2xmbfqQG+lxsv1svxGCnqPcv5jr0eu3l6PyQYT2PjgaLv5yhZf59mWqpLFPca2A3q/jTc/J0EbBv7/dBwbp8K9I9t2we4Fhs/0KfUsbfheByCPZPiUnKmLsQGxy7ABiluUsoYW4i/MzY97O+wMpxOWN12fBxH2Xz/hljis4BtED4324TfN8MS9neBQ2Pvi8751F7fs7gkHkCWlpyLW3QRiKZgfBm7m54bTogJNE6Ee2HJ/FMkOFMCNoB1SM66H4cvndNi66JkvRPWqlCP9Q6kuau1Cqt9fitc7JpsvcTKYOqxJCTx7j1s3uB64GHCAEWsxOKn2MxAe2IzIRwYXusRPnOfYgOXEv9SzffZCut3wJ7SGT0CfX1s8NVNNDzc5HDsZmV9UjQAK2c/eoRz/fLwe/z8PjAcv0ZlMGlaaJijuguWHMZbpHekoRXt+7HjuU64mKdq2thY3FdjCXiUXF1GQw/HprHv2TWxGXfeJJb8ljjWNbDSr0cJvanA8BDvs+E7IErWL8BabW/DykZ+ivXgpGkMTWdsSr98ZXpPYk+1rSdn5hOsVfpj7DqZeK8mdg3/RzgGr2FP2O4a1h9RLt+/sbj+nLPuMuwG5CZikyRgvTPPhM/kSsm6lhIet6QDyPISTopnsVrVoViL22Phy+nVcKFwYTk6fDHdErt4lLQbPHz5XBq+/PeNrV83doE7PbY+mk+5bziZl2MtIYlPwxYuEt/Dygt+Elu/J5YUbprz/vhNVjSIcXtSMJtI7OK8E9Zi8zgNyXrUwnF1OG5dY9s9giVRX5HwI52b+myF18aHYzIJa92MHm7SLby+DXYTuGWp425iX/LeuGE9Ys8C98XWxVsSo+nOZgJbJL0fIaY+5DwNkYabij/SOFn/Dpas12HlFXeH77NZJDwINud8j+atnhS+Z6Mncf4qfEedhPXgvIK1HEY9mgNJ6AnDWMnUbKxkKrehZNPw//5s9LnBGh1+gSW0tdgsME+QkikLsZuO+cAxsXWXhe+jrcPvZ9FQYvGrnO1/Gf4/hpYq5maOy1vYzdPWxHr3wuvRNTDV37+xY/IuVjbVK6wbivUw1YfPT1caXwu3xJL1t4Cjkt6HVXVJPICsLNg0i8dircsnYUn5/lgL+bDwntuwkosfY/XRLxISRmAI1tr7zaOGSxx/dxpa/c9j5TmUh5E/WXfYBfwRrOt8nRQci6iO891wYa4HXsAeYLQrlmhMyvf/jA2A3YuUDGIKnyMX+1zsjCW13yTrYf2jwLOx34djidRAEi4TacVna20soZqJtaTfSMOMAn2wwU2PkVBLZ06s38Lq6XfPWV8VjtUVWCvad2i4warAkvgHwvIhcGl4LcmH/4zHBok+QOPBrv2wRKkm7E9uy/p8rCzpPuwZD4nW1zdxvj+H1df/NBybw7BZbL4fjsWR4X2PY8lIkmVH3cL58Tmhfjn3c4El6x+H/do89tkaipWODSUlT6zGEsKo1XxQWLdF+I6aEn7/FXaD8TOspK+enESQhB/OFD43N2EJ7LDY+nyz8TxESr9/Y8dkNnadXjPntQlYbrKcUNpG42R9C6yn6eU07MuquCQeQBYWrIvoIxrmS63H6ru+R0P92qVYkj4+XCh+Hd43g5yWwnxfBB0cfxdsQNjD2EC+vPO1Yy3rfwxx/zac/CPCummko0Qk+kJ6CKvrXDv8X38ZvqT6hdcfyNnOYa2592LJZKL1tuQ8AIvG3ceTsWT9MRpap07DWt3OwwZiXod1+Sc6NVsbPltRK+5cwmPBsdkRpoVzKvHBcVgX92s0tALegpUdxI9NXywR/xex+ttw7jyO1bWej01Hl1gigiWnnwJ3AEc2sa9RMpWbrO8UPn8Xx9Ylkug2c74vDMdqTWwO+KfDORKdV0PC+RF9V3dOOP4V2A3Q2TT06OU+/C6erG9FCgdZhv2ZQ6x8J6wfhTXkrIbdLC0E9g+vRU+IrgdOjm2T9BitIVhv0U+b+3xjJWBnYtf389P0/ZvnmEQ3TrmfrfWx3pyvou8tGifrm5HSksNVYUk8gHJfsO6wGmzqvvWw1p1fhgvxP7Eu2SHYPNYH0tASskk4eeqAmxPeh+NDrGPzfTmy8oONzgeWhS/b97AL/rgUHIuu4f/0oXCBjpLc1bGHatQD22EDlmqwJGUklkyOw2oJ55HwFIxhP94PF4krsRru3O7w3bCL+wzsITNdsAHIi7DWzrdTckza8tk6CHsIzVysdfR1rCUn8f0I8VVgMyPUA3fR8KTUl7ABo9EDpcaGz+EibPagq8PxeDm8fmI4b3omtB/jwv/xhTROpnJvnvrTMEAuN1nfioaW9KTKXZo73y8Jx2Yi1tr8KXBCbNsdwrHZnIQSEBpanv+GDc67E5vO8xwayr6aStafIgz8S8sSjsd70fEI6+KlX3uEP+/Aesm6xF6bgZUifU5KBsJiszHV00z5Iw3X8wFYz9R8bOaUtHz/5jsm0Xm7GtbrcWD4fRxWYpg3WdeS4HFMOoByXrBu8HrskdOrxy4Uq2GPNf8aS2w3Cz/vH9v2ICyxGpl7gUxgP24GHs6z/rAQ46vYDCgbhPW9sRa5i7Dp8lIx3RQN3dm/i62LWtD2xRLYDbCxA78MF+8vaJjPOvHHhOfsRz2WrNZjCcl0bO73IeF9W2DJ+hOE1mqsG3MSKZjasw2frb/FPlsTsZ6oM7EypcRbpEJc0bm9LpYoXYANTjweK2Grx1pGf471zFRjA7SewhL5aTT0JjyIJYkln0sZa/27BOs5GhzWRclGFXbDt25sf3uHfazBes5yp2tNsmSkpfN9EdYrMyic27dh39nrYDfAfyenDKvEx+GLcO4Oja27i5aT9U3Cef8wCfUEtOZ40PjGaVb4LuiHzcBzbWy7MViivjMperol1jK+nIanpeYreYn2sT/WeLIF6fr+beqYrIZdW54jtLKH9VGyvpBQDaAl+SXxAMp1CV+qp4aT4IzY+mhwydFYq9VQLBlfgNXefhtrxfkHjZ+wmFTXcRU2s8nTWD2ww3oGHg37tgBLNJZgiciwpP/vm9mXXljpRzR1Wbwk4RqslbpP+L0zVkd4GpZQHQ6snfQ+xPbjNKzX4hKsLvgPWCvzF1gZz51Yi8/PsV6ZO4hN3ZiGpY2frTlp/mzF9qkHVvKymFCOEy56B2N10vXY+JPTw+cr/nyBQVjr+iISKuUJx+QF4Lqc9UOwwX0vYTciM4Bvhdf6YmULKw36S/hYtOZ87xt+nxLOk4VY0v4pCT+GHkuKopulKIGqpnXJ+sak7HkIOcfj9LBuNexm/NnYvl4bPmP7Ysn5n7BSsX5Jxd7E/qyHNbD9PrYubwsz1mL951LEVaRj8lr4Xo4afeJlLhuF4/URKXow06q8JB5AOS/YU+LiF4p41/CLwEOx3/fGWqWWYd17z5Oex4VvRsNTUm8PX6KfYy3mfbCWw++GL60r0xBzM/uyBtbDUQ+cGtadij2wJarnTv30UlhC+LuwH78M66qwOtwzwhdpNM1c1Pp+Pel7EFCbP1tpX2iYuvPXsXXdsMTwBayV9Mvwnl+E17fBSoAS7RLHblCfxLrp18RK9XbFkteojOcBLJF9l4bu8v7Aj0jJIOvY/rTmfI++Z7fGbkZ+TYqezhvbl6gkoVXJehqXnONxJpaAP4PdCMbLRKKndC7CEsLxSceeZ196YA1qcwllO/mOAzaY9xFSOitKc8ckz3ursXKZdcjzQC0tCR3DpAMo9yXnJPh1WHdfuMhFrTnR9F+bYF3++5PwTAl59uN/sCnLPsFGum9D4xrCLlit2/VJx9rGY/IU1mK70mj2nG1Sd+MR9uMP0RdsntfWA34TLuqfEspH0rZk6bMVi/lObKrFPuHC9gZ2c75mWDcmfAbjNbp7AcNTEPtWNCTlz2NlFLOAo8Pr1VgLdC3w2zzbp+I7KxZPc+d71FKduvO7iX3JUrK+NHwvDWrifXtjLeqJTsHYwr6Mw2q2XyRnpqfwei+srO0tUtIjW+gxwRobbsFKqlLfmLUqLYkHkIUlJ6F6HysRGRdea/JCkbaTISRMA/Ksd1h32CzguKb2J00L1lJ4brho3550PO3Yj3gSMjW2PrdVp2fSsbawH5n5bIUYjwgJ7unYTciLNDHojNgTANOyYGNMnsLqVE8iZwA1VrK3KP6ZS/OSlfM97Etusj4X64FKzVMtW7EPPbAZbL7p6Yi9lvqbjZx4J4dk/X2sN3MANnZjD6wXczEpGDjazmPSHRsw/iUpeRq3ltjxSTqArCwhoTon3LFekXQ8Rdif+OOFe2G1tbNJcatBnn3oSaw0Kel42rEfeZP18FqiM28UuD9l+9micS3nTBoeFDK8nI5BiL8qX/KHzW4zOdyA/CDpONuwP5k438O+xJP1x7CSqVTVcLdiH5r83iq3BRuk/yIN5atLsbEoT5KSh0wVekywlvQ/hZvcbyUdo5aVlyqkKLz3Xzjnzscufsc55+Z5789IOq5Cee9rAJxzk7AZar6LDVZ8P8m42sJ7v8g5dzY28Pc055z33p+ZdFxtFT5b0WdpqnOuznt/VnitLvzpEwuwjcr5s+W99y58kLDWtA2BGd77OQmH1mbe+xXOua8BnHNV4fcK7NkIU7Ha4VuTjLEtsnK+g53XzrlK732tc24n7MmWnyYdV1s0971Vbrz3L4bjMAx7FkolYcCl935BgqG1SZ5jUoUNFv8xsJX3/uXkopOmKFEvonChOAtrkZpazhcK51wnbCR7L2yg3/94719LNqq2i30x1QGnO+eWe+/PTzqutsrZjzOcczXluB9Q/p+t2E3RvcDJ2JSSxBL4shHFG5L0PljN8FHYoNNtYgljXZJxtlZWzndolKyvwG6ayk6Wvre8959jA+FfTDqW9sg5JidhreubKklPL1dm15Wy4JxbA7uAHwcc5r2/OuGQCuKcm4gNgL3Tez8v6XjawznXAzseN3rv30g6nkJlaD8y8dlyzh0JXAbs4L1/LOl4ChU+V9GTld/EnvmwImppTza6tsvKeZIVOh7p45zrCfwMuMN7/++Ew5FmKFHvIOGL6cdYvXrZXegi5dhK2BTnXIX3vj7pONorQ/tR9p8t59ww7Em9Pyrn8xzAOTcOWBu4z3tfX04t6flk5TzJCh2P9NExKQ9K1EugXFulRKT1snSe6wIuIpIOStRFRERERFKoIukA4pxz+zjnLnHOzXDOfeGc8865G5KOS0RERESk1NI268vJNDwJ7CNg/WTDERERERFJRqpa1IFfYI9FXwM4MuFYREREREQSk6oWde/949HPzrkkQxERERERSVTaWtRFRERERAQl6iIiIiIiqZSq0pdi2G677TIx3+SFF14IwDHHHJNoHO2l/UifrOyL9iNdsrIfkJ19ydJ+jB8/PukwiuaVV14p+2MC8MQTT5SqRrlNed2cOXM45JBDCvqHLr74YsaOHdvWzTr0/0Et6iIiIiKSCa+88krSIRRV5lrURWTVMX78eJ544omkw2i3rF1YRESSssMOO/D222+zbNkyampqWL58OUuXLmX58uW88847zW7bvXv3EkXZekrURaRsZaELOSpPEBGR9ttvv/348ssvC9p2wIABRY6m/ZSoi0jZUou6iIjE7bbbbtx4440FbfvGG28wYcKEIkfUPkrURaRsqUVdRETiCk3SAYYOHVrESIojVYm6c24KMCX8OjD8uYVz7trw82fe++NKHJaIiIiIlIHBgwczd+7cZt9z1113UVlZSWVlJRUVFd/8WVGRvjlWUpWoA+OBA3PWjQgLwPuAEnURAVT6IiIijW222WbceeedTb6+9tpr06NHjxJG1D6pStS996cBpyUchoiUCZW+iIhI3IwZM5p9vaKigiVLllBdXU1VVRXOlWo6+MKkKlEXERERESnUAQccwAUXXNDk63PmzGHXXXfN+9r06dMZNGhQR4VWECXqIiIikmpZ6D0D9aCVwu67787uu+++0vr6+nq23377Zretr6/vqLAKpkRdREREUi0r41FAY1I62syZMzn55JML2raysrLI0bRf+oa3ioiIiIgU4L333it429ra2uIFUiRqURcRERGRTJg4cSIvvPACixcvXmmg6OzZs5vdtnfv3h0ZWkGUqIuIiIhIJlx33XUFlxfV1dUVN5giUKIuIiIiIpnQmjrzvffem/r6eurq6r5ZRo8ezRprrFGCCNtGibqIlK2sDDDT4DIRkeIYM2YMTz75ZLPvuf3221da9+CDDzJmzBjWWWedjgqtIErURaRsZWHKNk3XJiJSPO0pX0njYFLN+iIiIiIimbBo0aKCt+3UqVPxAikSJeoiIiIikgnz5s0reNu+ffsWMZLiUKIuIiIiIplQU1NT8LZffvllESMpDtWoi4iIiEgmTJ06lauvvpqamppvZnSJZnh59NFHm922a9euJYqy9ZSoi4iIiEgmXHbZZdx///0Fbfv555/Ts2fP4gbUTip9EREREZFM6NatW0HbDRo0iF69ehU5mvZTi7qIiIiIZMLhhx/OxhtvvNI0jd57TjrpJMBmd6murm60bLLJJip9ERERERHpKDNnzmTq1KnNvqempmalQaf33HMPW265JRMnTuzI8NpMpS8iIiIikgmdO3cueNvx48cXL5AiUYu6iJSt8ePH88QTTyQdRru98sorSYcgIpIJ48eP5+ijj877lNE//vGPzW47e/ZsRo8e3VGhFUSJuoiUrVdeeYVjjjkm6TDa5cILL0w6BBGRzDjttNN45plnCtp2yJAhRY6m/ZSoi4iIiEgm9OjRo9nXx4wZw6WXXlqiaNpPibqIiIiIZMKkSZN48MEHm3z99ddfz9uTOXr0aHbaaSeccx0YXdspUReRsqUadRERiVu4cGGL77n77rvzrhs5ciQjRozoiLAKpkRdRMqWatRFRCRu5syZBW+7fPnyIkZSHJqeUUREREQyYdmyZQVvm8YnkypRFxEREZFMGDZsWEHbDRw4sMWBqElQ6YuIlC3VqIuISNyOO+7Irbfe2ubt5s+fz/z58xk+fHgHRFU4JeoiUrZUoy4iInFHHHFEQdsNGjRILeoiIiIibZWFm3LQjXkpHHnkkVx++eVt3m7evHkdEE37KVEXkbKl0heRVUNWznXQ+d7RpkyZQkVFBcuWLaOuro76+npqa2tZtmwZd955Z7PbLliwgN69e5co0tZRoi4iZSsLrWxqYRMRKZ7rrruOv/71rwVtu9pqqxU5mvbTrC8iIiIikgmbbrppwdum7amkoBZ1ERERSbks9J6BetBK4dlnny1428rKyiJGUhxK1EVERCTVVKMupVBfX590CCtRoi4iZSsrF29duEVEiuOTTz5p8T2HHnoolZWVVFRUfPPniBEjGDx4cAkibBsl6iJStrLQHa6ucBGR4unevXuL75k2bdpK6zbaaCPOO+88Onfu3BFhFUyJuoiIiKRaFm7KQTfmpbD11ltzzz33tHm7WbNm8fbbbzN27NgOiKpwStRFREQk1bJS5gYqdetoL774YsHbVlSkbzJEJeoiIiKSampRl9Z66623Ct62NWUzpZa+WwcRERERkQK0Zy70pUuXFjGS4lCLuoiUrax0h6srXKR5WTnXQed7Rxs+fDgvv/xyQdtqekYRkSLKQne4usJFRIrnvvvuK3jb5cuXFzGS4lDpi4iIiIhkQqGNN5tvvjkbbrhhcYMpArWoi4iIiEgm7LLLLuyyyy55X9t1112ZMGECU6ZMobq6mk6dOlFdXU11dTXdu3enqip9aXH6IhIRERERKcDnn3/Oueeem3dg6JIlS5gxYwYzZszIu+3VV1/N8OHDOzrENlGiLiJlKysDzDS4TESkOC699FJeeOGFgrZVjbqIiIiISAd59tlnC962a9euRYykONSiLiJlS7O+iIhIXL9+/fjwww8L2vbtt99m6NChRY6ofZSoi4iIiEgmXHvttcyaNYu6urqVXjvuuOOa3XbEiBEdFVbBlKiLiIiISCY45+jXrx+1tbXU1dVRX19PXV1d3sQ914oVK0oQYdsoURcRERGRTJg+fTrTpk0raNtOnToVOZr2U6IuImVLs76IiEjcp59+WvC2r7/+OmuttVYRo2k/JeoiUrY0mFRk1ZCFcx10vpfCe++9V/C2a665ZvECKRIl6iIiIpJqWek9A/WgdbTjjjuO66+/nqVLl1JRUUFlZSUVFRVUVFTw8MMPN7vtl19+WaIoW0+JuoiIiIhkwr///e8WE/KmDB48uMjRtJ8SdRERERHJhMWLF7f4nurq6pXW7bzzzqmcnlFPJhURERGRTHj00UdbfE9tbe1Ky3333cfcuXNLEGHbqEVdREREUk2DSaW1evToUdB2AwcO1GBSERERkbbSYFJprUJnfZk/fz6vv/46Y8eOLW5A7aREXURERFJNLerSWnvssQdXXnllQdv279+/yNG0nxJ1ERERSTW1qEtrDR06tOBt6+rqihhJcShRF5GylZWLty7cIiLFMWvWrBbfc+yxx9KpUyeqq6u/+XPgwIEMGjSoBBG2jRJ1ESlbWegOV1e4SMuycK6DzvdSaM30jBdccEHe9TfeeCMDBw4sdkjtokRdREREUi0rvWegHrSOdvzxx7PddtvlLWM55ZRTmt32s88+U6IuIiIiItIRKisr2WKLLfK+1qVLF0aNGsX2229PVVUV1dXVVFZWAjat45gxY0oZaqsoUReRspWVVja1sImIFMe7777LHXfcwbJly6irq6O+vv6bZcmSJQwdOpTJkydTUVGBcy7pcFukRF1EREREMuGwww5r9vV77rmHe+65J+9rt956K3379u2IsAqmRF1EylYWBphpcJmISPI23HBDunXrlnQYK1GiLiIiIiKZ8O1vf5vHHnusydd79erFHXfcUcKI2keJuoiULdWoi4hIXEvXhIULFzJp0qSV1vfs2ZMrr7ySfv36dVBkhVGiLiJlS6UvIiISN3nyZB544IE2b7do0SLmz5+vRF1EREREpCMcd9xxTJ48mWXLllFTU8Py5ctZtmwZy5Yt45JLLml22zTOAqNEXUREREQyYfr06UybNq2gbQcNGlTkaNpPibqIiIikWhbK3EClbqXQnlbxuXPn0rt37yJG035K1EWkbGkwqciqISvnOuh872iF1KdHamtrixhJcVQkHYCIiIiISDGMHDky6RCKSi3qIlK2stAdrq5wEZHiefLJJwvedt111y1iJMWhRF1EylZWusPVFS7SvCzclINuzEuhoqKC+vr6grb13hc5mvZToi4iZSsLF29duEValpWbctCNeUe78847Oeigg/jqq69Weq2lGvSKivRVhCtRFxERkVTLwk056Ma8FM4880wWLlxY0LYrVqwocjTtp0RdREREUk0t6tJaXbt2LXjb2bNnM2HChCJG035K1EWkbGXl4q0Lt4hIceQreWmtnj17Fi+QIklfMY6IiIiISAH+/e9/F7zt119/XcRIikOJuoiIiIhkwsSJEwvetnv37kWMpDhU+iIiZSsLA8w0uExEpHgKHUgK0KNHjyJGUhxqURcRERGRTFhvvfUK3vb9998vYiTFoRZ1ESlbGkwqIiJxb775ZsHbVlWlLy1OX0QiIq2k0hcREYlrz9NFBw0aVMRIikOJuoiIiIhkQmtmbjnyyCNXWjd8+HD69OnTESG1ixJ1ERERSbUs9J6BetBKoXfv3syePbvZ91x++eV5119xxRWMGjWqI8IqmBJ1ERERSbWsjEcBjUnpaD/4wQ944YUXCtp2xYoVRY6m/TTri4iIiIhkwsyZMwveVqUvIiIiIm2k0hdprQULFiQdQlEpURcREZFUU+mLtNZLL71U8LaffPIJAwcOLGI07adEXUREREQywTnX4nu23HJLqqurqaiooLKykoqKCjbccEPGjh1bggjbRom6iJStrLSyqYVNRKQ41lprLV577bVm3/P000+vtO6ll15iiy22oG/fvh0VWkE0mFREREREMqF///4FbffZZ5+xePHiIkfTfkrURURERCQT+vXrV/C2S5YsKWIkxaFEXUREREQy4bPPPit4248//riIkRSHatRFpGxlYco2TdcmIlI88+fPL3jbYcOGFS+QIlGLuoiIiIhkwuuvv17wtmmbmhGUqIuIiIhIRgwaNKjgbVesWFHESIpDpS8iUrY0PaOIiMQdffTRnHDCCQVt670vcjTtp0RdRMqWatRFRCSuPY03nTp1Kl4gRaJEXUREREQy4cADD2T99ddnxYoV1NbWNlpuvPHGZredN28eI0eOLFGkraNEXURERFItC71noB60Uujbty+rr746NTU1VFVV0alTJ+rq6qivr29x29ra2hJE2DZK1EVEREQkE8455xwee+yxgrZduHBhkaNpPyXqIiIikmpZGTgOGjze0VZfffWCt1133XWLGElxKFEXkbKVlYu3LtwiIsUxb968gretqEjfrOVK1EWkbGWhblU1qyIixVNZWVnwth988AH9+vUrYjTtl75bBxERERGRArzwwgsFb7vBBhsUMZLiUKIuIiIiIpkwYsSIgretq6srYiTFoURdRERERDLhwgsvpFevXlRXV3+zdOrUqVUPM/rggw9KEGHbqEZdRERERDLhpptuKniaxTTO+qIWdRERERHJhB49ehS87aJFi4oXSJGoRV1ERERSLQszPIFmeSqFBQsWFLzt119/XcRIikOJuoiIiKRaVp6ZAHpuQkfbfPPNufnmmwvatlu3bkWOpv2UqItI2crKxVsXbhGR4mjPQ4sWL16cunnUlaiLSNnKQne4usJFWpaFcx10vpfCvffeW/C2AwYMKGIkxaFEXURERFItK71noB60jrbtttvyyCOPFLTt7NmzGTduXJEjah/N+iIiIiIimTB48OCCt+3fv38RIykOJeoiIiIikgnt6bGYPXt28QIpEiXqIiIiIpIJXbp0SWTbjqJEXUREREQyoT3lK2uvvXYRIykODSYVERGRVNOsL9Ja7Rl0XFNTU7xAikSJuoiIiKSaZn2R1qqrqyt4206dOhUxkuJQoi4iIiKpphZ1aS3nXMHbLliwgN69excxmvZToi4iIiKpphZ1aa29996bBQsWsGjRIurq6qivr6euro66ujref//9Zrf94osvShRl6ylRFxEREZFMGDZsGGeffXbe1yZNmtTstu1pje8oStRFREREJBPq6ur4xz/+gfeeiooKKisrv/mzJSNHjixBhG2jRF1ERERSTTXq0lrXXHMN06dPL2jb+fPns+666xY5ovZRoi4iZSsrdauqWRVpXlbOddD53tGeeeaZgrddtmxZESMpDiXqIlK2stDKphY2EZHi6datW8HbDhgwoIiRFIcSdREREUm1LNyUg27MS6E9TybVYFIRERGRNlLpi5TCnDlz6Nu3b9JhNKJEXUTKVlYu3rpwi4gUR3ta1EeMGFHESIpDibqIlK0sdIerK1xEpHjuv//+grdV6YuIiIiISAe56qqruPfee1m+fPlKr91yyy3NbltTU9NRYRVMibqIiIikWhZ6z0A9aKXQr18/DjnkkLyvtZSof/XVVx0RUrsoURcREZFUy8p4FNCYlDRbsWJF0iGsRIm6iIiIiGReRUUF9fX1VFdXr/TaTjvtxKhRoxKIqnlK1EVERCTVVPoirfXVV1/x/PPPU1dXB0B9fT21tbXU1tZSX18PQG1t7Urb3X///UyZMoV11123pPG2RIm6iJStrHSHqytcpHlZOddB53tHO+KII/jvf/9b0La9e/cucjTtp0RdRMpWFlrZ1MImIlI8Uat5c/r377/SVIwTJ05kjTXW6KiwCqZEXUTKVlZa2dTCJtK8LNyUg27MS2Hs2LF8/PHHzb7nk08+WWndPffcww9/+EMGDhzYUaEVRIm6iJStLFy8deEWaVlWbspBN+Ydbffdd+eRRx5p8vV11lmHX/3qV3Tq1Inq6mqqqiwV7tq1K927dy9VmK2mRF1EREREMmHWrFnNvv7uu++yePHildYPGDBAibqIiIiISEfZb7/92HjjjVmxYgW1tbUsW7aMJUuW8PXXX3PRRRcBcMIJJ+Td9vrrr2fIkCGlDLdFbUrUnXO7AkcDGwB9gP8CLwIXeO+fyXlvd+BEYG9gGLAUeB4433v/aJ6/ezTwA2A88C1gaHip2nufvhnoRURERKRZzrltgOOACcAg4GDv/bWx1x0wFTgC6AU8BxzlvX+9kH/v6quv5oYbbigo1mhKx7Zqbh+dc9XAWcAuwDrAF8DjwIne+w9a+rsr2hDEecB9wMbAg8BFwEvAHsBM59z+sff2Ap4FfgOsAK4Abg/bPuKcOzTPP7EzcCqwK/A1sKy1sYmIiIhIKnUDXsMaepfmef144JfAz4FNgU+Ah0ODb5u1VPrSnNyZYNqguX3sguW/Z4c/98Aaox90zrXYYN6qFnXn3EDsTuFjYCPv/Sex1yYBjwFnANEtzGlYq/sdwPejFnHn3G+AF4BLnHN/995/FPtn/gY8A8zy3i91zr0HrN2a+ERERCS7sjBwHFbNwePe+weABwCcc9fGXwut6ccA53rvbw/rDsSS9R8Bf2rrvzdu3LiCk/WlS/PdR7SsuX303i8Gdoyvc879BHgdGA282tzf3drSl7Wx1vfn4kl6COBx59yXQL/Y6j3Dn6fGy1a895845y4A/g84BEvuo9feamUsIiIisgrRrC+ZNRwYCDwUrQiNtf8AtqSNiXpdXV275kJ/9NFHWbBgAZttthmVlZUF/z2tEAW5sKU3tjZRfxuoATZzzvX13n8WvRDqcroDd8XeH01COTvP3xWt255Yoi4iIiIiq5QoX8yd+PxjYHBb/qK6ujqOP/54Xn212QbqZt12223cd999jB49mvPPP79DknXnXCfgD8C9OZUlebUqUffeL3DOnQBcALzhnLsL+Bwriv8u8DDwk9gmnwFrYndKb+T8dSPCn6Na82+LiDQlK61samETEWmf559/njfffJPa2tqC/w7vPUuXLuWNN97g+eefZ4sttihihBBq0m8AemL5c8vbeO/b8g9MAa7GRuVG3gGmeu//GnvfVcBhwG3AD7z3dWF9P6xGfS2gxnu/WjP/1ntYyY1mfREREREpc865r4D/jc2IMgJ4F9jMe//P2PvuBz7z3h/Y2r970qRJp2BjJOMTpdQDUx9//PGzir1dU3L3Mba+CrgRGAts572f36q/r7WJunPueOAc4GLgUmA+sD7wW2An4Hfe++PDe9fEptcZio2CfRToio10nYtNwbjMe796M//eeyhRFxEREcmEPIm6A+YBl3jvzwnrOmODSX/lvW/zYNKk5UvUwxSNNwEbYkn6f1v797Vqekbn3HbAecA93vtjvfezvfdLvPcvYQNH5wK/DHdGhAA2BS7D6td/hk27eDOwb/hrGw1KFREREZFscc51c86Nd86Nx/LOtcLva3lrLb4QOME5t5dzbkPgWuAr4K9N/Z1p09w+hpb0W4HNgR8C3jk3MCxNNlhHWjuP+m7hz8dzX/DeL8EeZFSBPagoWv+x9/5/vffDvPedvPeDvPc/x8peAP6Z+3eJiIiISKZsArwcltWB08PP0YQi52OzAV6GlUevCezkvf+y9KEWrLl9HIJVlAzCHhL639jy/Zb+4tbO+hLVkvdr4vVofU0r/q4Dwp9lc6ckIiIiIm3nvX8CaPJJQqFV/bSwlKWW9rGF15rV2hb1GeHPI5xzjabLcc7tAmyFPUn06bCuwjnXbaUonfsxlqg/TePpHEVEREREJKa1Leq3AY8AOwBvOufuxAaTjsbKYhxwovf+8/D+LsDHzrmHsdG89VgyvwXwJrCv974+/g845/oCv4+t6hv+nOaci0a8nuu9/3cb9k9EREREpCy1ZdaXauAo4AfABlgyvgCrT7/Ye/9QznuvALbGanPAHpp0C3BhqGvP/fuHAXNaCGNS6F4QEREREcm0Ns2jLiIiIiIipdHaGnURERERESkhJeoiIiIiIimkRF1EREREJIWUqIuIiIiIpJASdRERERGRFFKiLiIiIiKSQkrURURERERSSIm6iIiIiEgKKVEXEREREUkhJeoiIiIiIin0/wGuqytZ/NVKwAAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<Figure size 864x216 with 2 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "#可视化查询缺失值\n",
    "msno.matrix(train,figsize=(12,3))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5d5a2c49",
   "metadata": {},
   "source": [
    "训练数据总共有891行,其中数据类型列：年龄（Age）、船舱号（Cabin）、Embarked（上船的港口编号）里面有缺失数据：\n",
    "\n",
    "1）年龄（Age）里面数据总数是714条，缺失了177，缺失率177/891=19.9%；\n",
    "\n",
    "2）船舱号（Cabin）里面数据总数是204条，缺失了687，缺失率177/891=77.1%；\n",
    "\n",
    "3）Embarked（上船的港口编号）只缺失了两条数据\n",
    "\n",
    "测试数据总共有418行,其中数据类型列：年龄（Age）、票价（Fare）、船舱号（Cabin）里面有缺失数据：\n",
    "\n",
    "1）年龄（Age）里面数据总数是418条，缺失了86，缺失率86/418=20.6%；\n",
    "\n",
    "2）票价（Fare）只缺失了一条数据\n",
    "\n",
    "3）船舱号（Cabin）里面数据总数是418条，缺失了327，缺失率327/418=78.2%；   "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 297,
   "id": "21f94deb",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>PassengerId</th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Cabin</th>\n",
       "      <th>Embarked</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>0.0</td>\n",
       "      <td>3</td>\n",
       "      <td>Braund, Mr. Owen Harris</td>\n",
       "      <td>male</td>\n",
       "      <td>22.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>A/5 21171</td>\n",
       "      <td>7.2500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1</td>\n",
       "      <td>Cumings, Mrs. John Bradley (Florence Briggs Th...</td>\n",
       "      <td>female</td>\n",
       "      <td>38.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>PC 17599</td>\n",
       "      <td>71.2833</td>\n",
       "      <td>C85</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>1.0</td>\n",
       "      <td>3</td>\n",
       "      <td>Heikkinen, Miss. Laina</td>\n",
       "      <td>female</td>\n",
       "      <td>26.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>STON/O2. 3101282</td>\n",
       "      <td>7.9250</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1</td>\n",
       "      <td>Futrelle, Mrs. Jacques Heath (Lily May Peel)</td>\n",
       "      <td>female</td>\n",
       "      <td>35.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>113803</td>\n",
       "      <td>53.1000</td>\n",
       "      <td>C123</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>0.0</td>\n",
       "      <td>3</td>\n",
       "      <td>Allen, Mr. William Henry</td>\n",
       "      <td>male</td>\n",
       "      <td>35.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>373450</td>\n",
       "      <td>8.0500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1304</th>\n",
       "      <td>1305</td>\n",
       "      <td>NaN</td>\n",
       "      <td>3</td>\n",
       "      <td>Spector, Mr. Woolf</td>\n",
       "      <td>male</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>A.5. 3236</td>\n",
       "      <td>8.0500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1305</th>\n",
       "      <td>1306</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1</td>\n",
       "      <td>Oliva y Ocana, Dona. Fermina</td>\n",
       "      <td>female</td>\n",
       "      <td>39.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>PC 17758</td>\n",
       "      <td>108.9000</td>\n",
       "      <td>C105</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1306</th>\n",
       "      <td>1307</td>\n",
       "      <td>NaN</td>\n",
       "      <td>3</td>\n",
       "      <td>Saether, Mr. Simon Sivertsen</td>\n",
       "      <td>male</td>\n",
       "      <td>38.5</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>SOTON/O.Q. 3101262</td>\n",
       "      <td>7.2500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1307</th>\n",
       "      <td>1308</td>\n",
       "      <td>NaN</td>\n",
       "      <td>3</td>\n",
       "      <td>Ware, Mr. Frederick</td>\n",
       "      <td>male</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>359309</td>\n",
       "      <td>8.0500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1308</th>\n",
       "      <td>1309</td>\n",
       "      <td>NaN</td>\n",
       "      <td>3</td>\n",
       "      <td>Peter, Master. Michael J</td>\n",
       "      <td>male</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>2668</td>\n",
       "      <td>22.3583</td>\n",
       "      <td>NaN</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>1307 rows × 12 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "      PassengerId  Survived  Pclass  \\\n",
       "0               1       0.0       3   \n",
       "1               2       1.0       1   \n",
       "2               3       1.0       3   \n",
       "3               4       1.0       1   \n",
       "4               5       0.0       3   \n",
       "...           ...       ...     ...   \n",
       "1304         1305       NaN       3   \n",
       "1305         1306       NaN       1   \n",
       "1306         1307       NaN       3   \n",
       "1307         1308       NaN       3   \n",
       "1308         1309       NaN       3   \n",
       "\n",
       "                                                   Name     Sex   Age  SibSp  \\\n",
       "0                               Braund, Mr. Owen Harris    male  22.0      1   \n",
       "1     Cumings, Mrs. John Bradley (Florence Briggs Th...  female  38.0      1   \n",
       "2                                Heikkinen, Miss. Laina  female  26.0      0   \n",
       "3          Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1   \n",
       "4                              Allen, Mr. William Henry    male  35.0      0   \n",
       "...                                                 ...     ...   ...    ...   \n",
       "1304                                 Spector, Mr. Woolf    male   NaN      0   \n",
       "1305                       Oliva y Ocana, Dona. Fermina  female  39.0      0   \n",
       "1306                       Saether, Mr. Simon Sivertsen    male  38.5      0   \n",
       "1307                                Ware, Mr. Frederick    male   NaN      0   \n",
       "1308                           Peter, Master. Michael J    male   NaN      1   \n",
       "\n",
       "      Parch              Ticket      Fare Cabin Embarked  \n",
       "0         0           A/5 21171    7.2500   NaN        S  \n",
       "1         0            PC 17599   71.2833   C85        C  \n",
       "2         0    STON/O2. 3101282    7.9250   NaN        S  \n",
       "3         0              113803   53.1000  C123        S  \n",
       "4         0              373450    8.0500   NaN        S  \n",
       "...     ...                 ...       ...   ...      ...  \n",
       "1304      0           A.5. 3236    8.0500   NaN        S  \n",
       "1305      0            PC 17758  108.9000  C105        C  \n",
       "1306      0  SOTON/O.Q. 3101262    7.2500   NaN        S  \n",
       "1307      0              359309    8.0500   NaN        S  \n",
       "1308      1                2668   22.3583   NaN        C  \n",
       "\n",
       "[1307 rows x 12 columns]"
      ]
     },
     "execution_count": 297,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Embarked缺失量为2，相对训练集数据量而言缺失很小，所以选择直接删除\n",
    "train=train.drop(train[train['Embarked'].isnull()].index)\n",
    "data=data.drop(data[data['Embarked'].isnull()].index)\n",
    "data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 298,
   "id": "1ed3f0e0",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "PassengerId      0\n",
      "Survived         0\n",
      "Pclass           0\n",
      "Name             0\n",
      "Sex              0\n",
      "Age            177\n",
      "SibSp            0\n",
      "Parch            0\n",
      "Ticket           0\n",
      "Fare             0\n",
      "Cabin          687\n",
      "Embarked         0\n",
      "dtype: int64\n",
      "PassengerId       0\n",
      "Survived        418\n",
      "Pclass            0\n",
      "Name              0\n",
      "Sex               0\n",
      "Age             263\n",
      "SibSp             0\n",
      "Parch             0\n",
      "Ticket            0\n",
      "Fare              1\n",
      "Cabin          1014\n",
      "Embarked          0\n",
      "dtype: int64\n"
     ]
    }
   ],
   "source": [
    "# 现在\n",
    "print(train.isnull().sum())\n",
    "print(data.isnull().sum())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "887c834f",
   "metadata": {},
   "source": [
    "### 特征工程"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f4563418",
   "metadata": {},
   "source": [
    "##### 字符型数据处理\n",
    "\n",
    "（1）Name字符型数据处理\n",
    "\n",
    "    通过对Name特征观察，其中有用的信息只有头衔，如Ms、Miss，所以处理方法选定为提取乘客Name中的头衔，并生成Title特征归纳为六类"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 299,
   "id": "b9c99f2d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>PassengerId</th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Cabin</th>\n",
       "      <th>Embarked</th>\n",
       "      <th>Title</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>0.0</td>\n",
       "      <td>3</td>\n",
       "      <td>male</td>\n",
       "      <td>22.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>A/5 21171</td>\n",
       "      <td>7.2500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "      <td>Mr</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1</td>\n",
       "      <td>female</td>\n",
       "      <td>38.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>PC 17599</td>\n",
       "      <td>71.2833</td>\n",
       "      <td>C85</td>\n",
       "      <td>C</td>\n",
       "      <td>Mrs</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>1.0</td>\n",
       "      <td>3</td>\n",
       "      <td>female</td>\n",
       "      <td>26.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>STON/O2. 3101282</td>\n",
       "      <td>7.9250</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "      <td>Miss</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1</td>\n",
       "      <td>female</td>\n",
       "      <td>35.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>113803</td>\n",
       "      <td>53.1000</td>\n",
       "      <td>C123</td>\n",
       "      <td>S</td>\n",
       "      <td>Mrs</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>0.0</td>\n",
       "      <td>3</td>\n",
       "      <td>male</td>\n",
       "      <td>35.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>373450</td>\n",
       "      <td>8.0500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "      <td>Mr</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1304</th>\n",
       "      <td>1305</td>\n",
       "      <td>NaN</td>\n",
       "      <td>3</td>\n",
       "      <td>male</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>A.5. 3236</td>\n",
       "      <td>8.0500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "      <td>Mr</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1305</th>\n",
       "      <td>1306</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1</td>\n",
       "      <td>female</td>\n",
       "      <td>39.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>PC 17758</td>\n",
       "      <td>108.9000</td>\n",
       "      <td>C105</td>\n",
       "      <td>C</td>\n",
       "      <td>Royalty</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1306</th>\n",
       "      <td>1307</td>\n",
       "      <td>NaN</td>\n",
       "      <td>3</td>\n",
       "      <td>male</td>\n",
       "      <td>38.5</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>SOTON/O.Q. 3101262</td>\n",
       "      <td>7.2500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "      <td>Mr</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1307</th>\n",
       "      <td>1308</td>\n",
       "      <td>NaN</td>\n",
       "      <td>3</td>\n",
       "      <td>male</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>359309</td>\n",
       "      <td>8.0500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "      <td>Mr</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1308</th>\n",
       "      <td>1309</td>\n",
       "      <td>NaN</td>\n",
       "      <td>3</td>\n",
       "      <td>male</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>2668</td>\n",
       "      <td>22.3583</td>\n",
       "      <td>NaN</td>\n",
       "      <td>C</td>\n",
       "      <td>Master</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>1307 rows × 12 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "      PassengerId  Survived  Pclass     Sex   Age  SibSp  Parch  \\\n",
       "0               1       0.0       3    male  22.0      1      0   \n",
       "1               2       1.0       1  female  38.0      1      0   \n",
       "2               3       1.0       3  female  26.0      0      0   \n",
       "3               4       1.0       1  female  35.0      1      0   \n",
       "4               5       0.0       3    male  35.0      0      0   \n",
       "...           ...       ...     ...     ...   ...    ...    ...   \n",
       "1304         1305       NaN       3    male   NaN      0      0   \n",
       "1305         1306       NaN       1  female  39.0      0      0   \n",
       "1306         1307       NaN       3    male  38.5      0      0   \n",
       "1307         1308       NaN       3    male   NaN      0      0   \n",
       "1308         1309       NaN       3    male   NaN      1      1   \n",
       "\n",
       "                  Ticket      Fare Cabin Embarked    Title  \n",
       "0              A/5 21171    7.2500   NaN        S       Mr  \n",
       "1               PC 17599   71.2833   C85        C      Mrs  \n",
       "2       STON/O2. 3101282    7.9250   NaN        S     Miss  \n",
       "3                 113803   53.1000  C123        S      Mrs  \n",
       "4                 373450    8.0500   NaN        S       Mr  \n",
       "...                  ...       ...   ...      ...      ...  \n",
       "1304           A.5. 3236    8.0500   NaN        S       Mr  \n",
       "1305            PC 17758  108.9000  C105        C  Royalty  \n",
       "1306  SOTON/O.Q. 3101262    7.2500   NaN        S       Mr  \n",
       "1307              359309    8.0500   NaN        S       Mr  \n",
       "1308                2668   22.3583   NaN        C   Master  \n",
       "\n",
       "[1307 rows x 12 columns]"
      ]
     },
     "execution_count": 299,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYIAAAEGCAYAAABo25JHAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/Il7ecAAAACXBIWXMAAAsTAAALEwEAmpwYAAAVU0lEQVR4nO3de7hldX3f8fdnBgkityKTZ1IuQnGMUqSQTNBKIkTRQhqhSdCAGKNSeWxFGxWntFqCWKsZYkxUUMcEUVIhqLkMdCpJkWAeBGQm3OFBh4syE08dbsrN4si3f+w1uDlz5pw9M3vtfc5Z79fznOes2177+5sN57PXb631W6kqJEndtWDcBUiSxssgkKSOMwgkqeMMAknqOINAkjpuh3EXsLX22muv2n///cddhiTNKWvWrLm/qhZNtW7OBcH+++/P6tWrx12GJM0pSb6zpXV2DUlSxxkEktRxBoEkdZxBIEkdZxBIUscZBJLUca0FQZLzk3w/ya1bWJ8kH0+yNsnNSX6hrVokSVvW5hHBBcAx06w/FljS/JwKfKrFWiRJW9DaDWVV9fUk+0+zyfHAF6r3QIRrk+yR5Oeq6ntt1SRpbli2bBkTExMsXryY5cuXj7uceW+cdxbvDdzXN7+uWbZZECQ5ld5RA/vtt99IipM0PhMTE6xfv37cZXTGnDhZXFUrqmppVS1dtGjKoTIkSdtonEGwHti3b36fZpkkaYTGGQQrgTc2Vw+9FPiB5wckafRaO0eQ5CLgKGCvJOuA3weeBVBVnwZWAb8GrAUeB97cVi2SpC1r86qhk2ZYX8Db23p/SdJg5sTJYklSewwCSeo4g0CSOs4gkKSOMwgkqeMMAknqOINAkjrOIJCkjjMIJKnjDAJJ6jiDQJI6ziCQpI4zCCSp4wwCSeo4g0CSOs4gkKSOMwgkqeMMAknqOINAkjrOIJCkjjMIJKnjDAJJ6rgdxl2AxmfZsmVMTEywePFili9fPu5yJI2JQdBhExMTrF+/ftxlSBozu4YkqeMMAknqOINAkjrOIJCkjjMIJKnjDAJJ6jiDQJI6ziCQpI5rNQiSHJPkziRrk5wxxfr9klyZ5IYkNyf5tTbrkSRtrrUgSLIQOBc4FjgIOCnJQZM2ez9wSVUdBpwInNdWPZKkqbV5RHA4sLaq7q6qJ4GLgeMnbVPAbs307sA/tViPJGkKbQbB3sB9ffPrmmX9zgLekGQdsAp4x1Q7SnJqktVJVm/YsKGNWiWps8Y96NxJwAVV9dEk/xq4MMnBVfVU/0ZVtQJYAbB06dIaQ51j9d2zX9zKfjc+uCewAxsf/M7Q32O/M28Z6v4ktafNI4L1wL598/s0y/qdAlwCUFXXADsBe7VYkyRpkjaD4HpgSZIDkuxI72TwyknbfBd4JUCSF9ELAvt+JGmEWguCqtoInAZcDtxB7+qg25KcneS4ZrP3AG9NchNwEfCmqupc148kjVOr5wiqahW9k8D9y87sm74dOKLNGiRJ0xv3yWKpNT6KUxqMQaB5y0dxSoNxrCFJ6jiDQJI6zq4hSdvlk++5dOj7fPj+x57+Pez9n/bR1wx1f/OBRwSS1HEGgSR1nEEgSR1nEEhSxxkEktRxBoEkdZxBIEkd530E0hzkOEoaJoOgw/ba6SlgY/Nbc4njKGmYDIIOO/2Qh8ddgqRZwCDQ2B3xiXYeSbHjwzuygAXc9/B9Q3+Pq99x9VD3J42TJ4slqeMMAknqOINAkjrOIJCkjjMIJKnjDAJJ6jiDQJI6ziCQpI4zCCSp4wwCSeo4g0CSOs4gkKSOMwgkqeMMAknqOIeh1rxVOxdP8RS1c427FGlWMwg0b/34iB+PuwRpTmi1ayjJMUnuTLI2yRlb2OZ1SW5PcluSL7ZZjyRpc9MeESR5BNjicXVV7TbNaxcC5wKvAtYB1ydZWVW3922zBPgvwBFV9VCSn93K+iVJ22naIKiqXQGSfBD4HnAhEOBk4Odm2PfhwNqqurvZx8XA8cDtfdu8FTi3qh5q3u/729AGSdJ2GLRr6LiqOq+qHqmqH1bVp+j9UZ/O3sB9ffPrmmX9XgC8IMnVSa5NcsyA9UiShmTQIHgsyclJFiZZkORk4LEhvP8OwBLgKOAk4LNJ9pi8UZJTk6xOsnrDhg1DeFtJ0iaDBsHrgdcB/7f5eW2zbDrrgX375vdplvVbB6ysqh9X1T3At+gFwzNU1YqqWlpVSxctWjRgyZKkQQx0+WhV3cvMXUGTXQ8sSXIAvQA4kc3D46/pHQl8Lsle9LqK7t7K95EkbYeBjgiSvCDJFUlubeYPSfL+6V5TVRuB04DLgTuAS6rqtiRnJzmu2exy4IEktwNXAu+tqge2tTGSpK036A1lnwXeC3wGoKpubq75/+/TvaiqVgGrJi07s2+6gHc3P5KkMRj0HMHOVfXNScs2DrsYSdLoDRoE9yc5kObmsiQn0LuvQJI0xw3aNfR2YAXwwiTrgXvo3VQmSZrjBg2C71TV0UmeAyyoqkfaLEqaT656+ZFD3+cTOyyEhCfWrWtl/0d+/aqh71Oz16BdQ/ckWQG8FHi0xXokSSM2aBC8EPg/9LqI7knyySS/3F5ZkqRRGSgIqurxqrqkqn4TOAzYDfDYUZLmgYGfR5DkyCTnAWuAnegNOSFJmuMGOlmc5F7gBuASenf/DmPAOUnSLDDoVUOHVNUPW61EkjQWMz2hbFlVLQc+lGSzJ5VV1Ttbq0ySNBIzHRHc0fxe3XYhkqTxmOlRlZc2k7dU1T+OoB5J0ogNetXQR5PckeSDSQ5utSJJ0kgN+mCaX02ymN4lo59JshvwF1U17TDUkqTNLVu2jImJCRYvXszy5cvHXc7g9xFU1URVfRx4G3AjcOb0r5AkTWViYoL169czMTEx7lKAwZ9Q9qIkZyW5BfgE8A16zyCWJM1xg95HcD5wMfBvquqfWqxHkjRiMwZBkoXAPVX1JyOoR5I0YjN2DVXVT4B9k+w4gnokSSM2aNfQPcDVSVYCT48zVFV/1EpVkqSRGTQI7mp+FgC7tleOJGnUBr2P4ANtFyJJGo9Bh6G+Ephq0LlXDL0iSdJIDdo1dHrf9E7AbwEbh1+OJGnUBu0aWjNp0dVJvtlCPZKkERu0a2jPvtkFwFJg91YqkiSN1KBdQ2v46TmCjcC9wCltFCRJGq2ZnlD2S8B9VXVAM/+79M4P3Avc3np1kqTWzXRn8WeAJwGSvBz4MPB54AfAinZLkySNwkxdQwur6sFm+reBFVX1FeArSW5stTJJ0kjMdESwMMmmsHgl8LW+dYOeX5AkzWIz/TG/CLgqyf3AE8A/ACR5Pr3uIUnSHDftEUFVfQh4D3AB8MtVtenKoQXAO2baeZJjktyZZG2SM6bZ7reSVJKlg5cuSRqGGbt3quraKZZ9a6bXNc8xOBd4FbAOuD7Jyqq6fdJ2uwL/Cbhu0KIlScMz8DOLt8HhwNqquruqnqT3hLPjp9jug8AfAD9qsRZJ0ha0GQR7A/f1za9rlj0tyS8A+1bV/5puR0lOTbI6yeoNGzYMv1JJ6rA2g2BaSRYAf0TvHMS0qmpFVS2tqqWLFi1qvzhJ6pA2g2A9sG/f/D7Nsk12BQ4G/j7JvcBLgZWeMJak0WrzXoDrgSVJDqAXACcCr9+0sqp+AOy1aT7J3wOnV9XqFmuS5oU9mgv49qjNHhMibbXWgqCqNiY5DbgcWAicX1W3JTkbWF1VK9t6b2m+e8NPnhp3CZpHWr07uKpWAasmLTtzC9se1WYtkqSpje1ksSRpdjAIJKnjDAJJ6jiDQJI6ziCQpI4zCCSp4wwCSeo4g0CSOs4gkKSOMwgkqeMMAknqOINAkjrOIJCkjjMIJKnjDAJJ6jiDQJI6ziCQpI4zCCSp4wwCSeq4Vp9ZLElz3YfecMLQ9/ng93/Q+z3xvaHv/31//uWtfo1HBJLUcQaBJHWcQSBJHWcQSFLHGQSS1HEGgSR1nJePSpp1nrPjbs/4rXYZBJJmnSMO/M1xl9Apdg1JUscZBJLUcQaBJHWcQSBJHddqECQ5JsmdSdYmOWOK9e9OcnuSm5NckeR5bdYjSdpca0GQZCFwLnAscBBwUpKDJm12A7C0qg4Bvgwsb6seSdLU2jwiOBxYW1V3V9WTwMXA8f0bVNWVVfV4M3stsE+L9UiSptBmEOwN3Nc3v65ZtiWnAP97qhVJTk2yOsnqDRs2DLFESdKsOFmc5A3AUuCcqdZX1YqqWlpVSxctWjTa4iRpnmvzzuL1wL598/s0y54hydHA+4Ajq+r/tVjPVlu2bBkTExMsXryY5cs9fSFpfmozCK4HliQ5gF4AnAi8vn+DJIcBnwGOqarvt1jLNpmYmGD9+s2yS5Lmlda6hqpqI3AacDlwB3BJVd2W5OwkxzWbnQPsAnwpyY1JVrZVjyRpaq0OOldVq4BVk5ad2Td9dJvvL0ma2aw4WSxJGh+DQJI6ziCQpI6bNw+m+cX3fmHo+9z1/kdYCHz3/keGvv8157xxqPuTpG3lEYEkdZxBIEkdZxBIUscZBJLUcQaBJHWcQSBJHWcQSFLHzZv7CNrw1I7PecZvSZqPDIJpPLbk1eMuQZJaZ9eQJHWcRwSSNGI7LVzwjN/jZhBI0ogd9txdx13CM8yOOJIkjY1BIEkdZxBIUscZBJLUcQaBJHWcQSBJHWcQSFLHGQSS1HEGgSR1nEEgSR1nEEhSxxkEktRxBoEkdZxBIEkdZxBIUscZBJLUcQaBJHVcq0GQ5JgkdyZZm+SMKdb/TJK/aNZfl2T/NuuRJG2utSBIshA4FzgWOAg4KclBkzY7BXioqp4PfAz4g7bqkSRNrc0jgsOBtVV1d1U9CVwMHD9pm+OBzzfTXwZemSQt1iRJmiRV1c6OkxOAY6rq3zfzvwO8pKpO69vm1mabdc38Xc0290/a16nAqc3szwN3tlL01PYC7p9xq7nL9s1d87ltYPuG7XlVtWiqFTuMsIhtVlUrgBXjeO8kq6tq6TjeexRs39w1n9sGtm+U2uwaWg/s2ze/T7Nsym2S7ADsDjzQYk2SpEnaDILrgSVJDkiyI3AisHLSNiuB322mTwC+Vm31VUmSptRa11BVbUxyGnA5sBA4v6puS3I2sLqqVgJ/BlyYZC3wIL2wmG3G0iU1QrZv7prPbQPbNzKtnSyWJM0N3lksSR1nEEhSxxkEfZJUkj/vm98hyYYkl42zru0xH9sEM7cryXFTDWsymwzzs0myR5L/ONwKt12SnyS5McmtSS5NsseQ939Bc68SSX4vyc7D3P9W1LFPkr9J8u0kdyX5k+biGJJclOTmJO9K8sLm3+OGJAcm+cY46t0Sg+CZHgMOTvLsZv5VbH7JK/D05a5zwXxsE8zQrqpaWVUfGUtlgxv4sxnAHsBWBUF62vob8ERVHVpVB9O7EOTtLb0PwO8BIw+CZhSEvwT+uqqWAC8AdgE+lGQx8EtVdUhVfQz4d8CXq+qwqrqrql42hPcf2v+vBsHmVgH/tpk+Cbho04okZyW5MMnVwIXjKG4bDdymJP8yyTebby83J1kyjoIHNF273pTkk830a5tvpjcl+XqzbLa0c7o2HJ7kmuZb5DeS/HyzfKraPwIc2Cw7p9nuvUmub7b5QLNs//QGgvwCcCvPvNenLdcAezfvf2iSa5ua/irJP2u+If9jX7uXbJpPcmbThluTrGj++NK37TuBfw5cmeTKJG9J8sd969+a5GMttesVwI+q6nMAVfUT4F3AW4CvA3s3n8fv0wur/5DkyqauR/tq/M9Jbmn++/xIs+zAJF9NsibJPyR5YbP8giSfTnIdsHxoLakqf5of4FHgEHrjHu0E3AgcBVzWrD8LWAM8e9y1ttUm4BPAyc30jrO1rQO0603AJ5vpW4C9m+k9Zks7B2jDbsAOzfTRwFe2VDuwP3Br375fTe/yxND7wncZ8PJmu6eAl7bdtub3QuBL9IaSAbgZOLKZPhv442b6SuDQZvp/AO9opvfs2+eFwGua6QuAE5rpe4G9muldgLuAZzXz3wBe3FIb3wl8bIrlNzSfa//ncRZw+hT/Psc2Ne7c317gCmBJM/0SevdYbWr3ZcDCYbZlLnUFjERV3ZzecNgn0fu2NtnKqnpitFVtn61s0zXA+5LsA/xlVX17RGVutQHatcnVwAVJLqF3KA+zpJ0ztGF34PPNN/4CntUs36z2bD5W46ubnxua+V2AJcB3ge9U1bXDbsskz05yI70jgTuAv0uyO70gvqrZ5vP0QgLgT4E3J3k38Nv0Bq0E+NUky+h1/ewJ3AZcuqU3rapHk3wN+PUkd9ALhFuG27ShOhr4XFU9DlBVDybZBXgZ8KW+z/Vn+l7zpeodfQyNXUNTWwn8IX2H6X0eG3EtwzJQm6rqi8BxwBPAqiSvGE1522y6dgFQVW8D3k+vG2RNkufOsnZuqQ0fBK6sXj/7a+gdNQz6GQX4cPX66Q+tqudX1Z8160bx3/ATVXUo8LymlpnOEXyF3rfjXwfWVNUDSXYCzqP3zf/FwGdp/g1m8Kf0jgjfDHxum6ofzO3AL/YvSLIbsB+wcTv2uwB4uO+zO7SqXtS3fuifn0EwtfOBD8zybxJba6A2JfkXwN1V9XHgb+gd4s5mM7YryYFVdV1VnQlsAPadZe3cUht256cnj9+0aeEWan8E2LXvtZcDb2m+XZJk7yQ/2075W9Z8030n8B56f8AeSvIrzerfAa5qtvtRU/On+Okf701/9O9v2nHCFt7mGW2vquvohf7rmeYLwhBcAeyc5I3w9DNYPkqv++bxAffxd/SOhHZu9rFnVf0QuCfJa5tlSfKvhl18P4NgClW1rvmfbN7Yija9Dri1Oaw/GPhCq4VtpwHbdU5zMu5Wev2xNzGL2jlNG5YDH05yA88cDmaz2qvqAeDq5qTqOVX1t8AXgWuS3ELvPMSujEFV3UDv3MBJ9MYWOyfJzcCh9M4TbPI/6Z2/+NvmdQ/TOwq4lV5IXL+Ft1gBfHXTidjGJcDVVfXQ0BoySfU67X8DeG2SbwPfAn4E/Net2MdX6R0Rrm4+z9ObVScDpyS5iV532ORnuQyVQ0xImhWSnA7sXlX/bQj7uozeidwrtr+y+c+TxZLGLslfAQfSuyRze/azB/BN4CZDYHAeEUhSx3mOQJI6ziCQpI4zCCSp4wwCaQZJntuMGXNjkokk65vpR5Oc12xzVJKX9b3mrOYqGGnW86ohaQbNNfqHQu8PPL1xYv5w0mZH0Rs7aFYNLywNwiMCaRs1RwGXNWMFvQ14V3Ok8CuTtptyJElptvCIQNpOVXVvkk/Td6SQ5JV9m6wA3tYMDvcSeuPnzPYxnNQhBoHUogFGkpTGziCQ2vX0SJLjLkTaEs8RSMMxefRPAMYxkqS0tQwCaTguBX5jqpPFjHgkSWlrOdaQJHWcRwSS1HEGgSR1nEEgSR1nEEhSxxkEktRxBoEkdZxBIEkd9/8BLFxjg75Dc4YAAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "data['Title'] = data['Name'].apply(lambda x:x.split(',')[1].split('.')[0].strip())\n",
    "Title_Dict = {}\n",
    "Title_Dict.update(dict.fromkeys(['Capt', 'Col', 'Major', 'Dr', 'Rev'], 'Officer'))\n",
    "Title_Dict.update(dict.fromkeys(['Don', 'Sir', 'the Countess', 'Dona', 'Lady'], 'Royalty'))\n",
    "Title_Dict.update(dict.fromkeys(['Mme', 'Ms', 'Mrs'], 'Mrs'))\n",
    "Title_Dict.update(dict.fromkeys(['Mlle', 'Miss'], 'Miss'))\n",
    "Title_Dict.update(dict.fromkeys(['Mr'], 'Mr'))\n",
    "Title_Dict.update(dict.fromkeys(['Master','Jonkheer'], 'Master'))\n",
    "# Mlle=miss（未婚） Mme=Mrs（已婚） Ms（未指明婚否） Rev 牧师，教师，教父 Master一般指小男孩\n",
    "# Major:少校 Lady：女士 Jonkheer:荷兰男性贵族的称呼 Dr:博士，医生 Don:西班牙男性贵族的称呼 \n",
    "# Dona:西班牙女性贵族的称呼 countess:女伯爵 Col:上校 capt:船长\n",
    "\n",
    "data['Title'] = data['Title'].map(Title_Dict)\n",
    "sns.barplot(x=\"Title\", y=\"Survived\", data=data)\n",
    "del data['Name']\n",
    "data"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "72d550e3",
   "metadata": {},
   "source": [
    "（2）Ticket字符型数据处理\n",
    "\n",
    "    通过对Ticket特征观察，发现其中有票号码相同，即团购票，所以按照相同票号人数多少，将Ticket特征重新划分为一个类别特征"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 300,
   "id": "2b3e61a7",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1     713\n",
       "2     262\n",
       "3     147\n",
       "4      64\n",
       "5      35\n",
       "7      35\n",
       "6      24\n",
       "8      16\n",
       "11     11\n",
       "Name: TicketGroup, dtype: int64"
      ]
     },
     "execution_count": 300,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYIAAAEGCAYAAABo25JHAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/Il7ecAAAACXBIWXMAAAsTAAALEwEAmpwYAAAVUElEQVR4nO3dfbRddX3n8feHIFURH2a447JJaDIaRUYt6DWlpYMPwDTUmVArtGG0o6s6Wa4hPj8Uly2L4jgdH6pja2xJLdVplUix7Vw1NjKCOOIDuUGekhQbA0JSlAuigE7F4Hf+ODt45uYk9wTuvucm+/1a66zsh985+3sDOZ+7f3vv3y9VhSSpuw4bdQGSpNEyCCSp4wwCSeo4g0CSOs4gkKSOO3zUBRyoo48+upYsWTLqMiTpoLJ58+Y7q2ps0L6DLgiWLFnC5OTkqMuQpINKkm/ta59dQ5LUcQaBJHWcQSBJHWcQSFLHGQSS1HEGgSR1nEEgSR1nEEhSxxkEktRxB92TxTp0XHny80Zy3Od98cqRHFearzwjkKSOMwgkqeMMAknquFaDIMmKJDcl2Z7k3AH735/k2ub1jSTfa7MeSdLeWrtYnGQBsBY4DdgJbEoyUVVb97Spqjf0tX8NcEJb9UiSBmvzjGA5sL2qdlTV/cB64Iz9tD8buLjFeiRJA7QZBAuB2/rWdzbb9pLk54ClwOX72L86yWSSyampqVkvVJK6bL5cLF4FXFpVDwzaWVXrqmq8qsbHxgbOtCZJeojaDIJdwOK+9UXNtkFWYbeQJI1Em0GwCViWZGmSI+h92U9Mb5TkWOAJwFdarEWStA+tBUFV7QbWABuBbcAlVbUlyQVJVvY1XQWsr6pqqxZJ0r61OtZQVW0ANkzbdt609fPbrEGStH/z5WKxJGlEDAJJ6jiDQJI6ziCQpI4zCCSp4wwCSeo4g0CSOs4gkKSOc/L6jjjpj08ayXGves1VIzmupOF5RiBJHWcQSFLHGQSS1HEGgSR1nEEgSR1nEEhSxxkEktRxBoEkdZxBIEkdZxBIUscZBJLUca0GQZIVSW5Ksj3Jufto8xtJtibZkuTjbdYjSdpba4POJVkArAVOA3YCm5JMVNXWvjbLgLcBJ1XV3Un+VVv1SJIGa/OMYDmwvap2VNX9wHrgjGlt/jOwtqruBqiqO1qsR5I0QJtBsBC4rW99Z7Ot31OBpya5KslXk6wY9EFJVieZTDI5NTXVUrmS1E2jvlh8OLAMeD5wNvBnSR4/vVFVrauq8aoaHxsbm9sKJekQ12YQ7AIW960varb12wlMVNWPq+pm4Bv0gkGSNEfaDIJNwLIkS5McAawCJqa1+Tt6ZwMkOZpeV9GOFmuSJE3TWhBU1W5gDbAR2AZcUlVbklyQZGXTbCNwV5KtwBXAW6rqrrZqkiTtrdU5i6tqA7Bh2rbz+pYLeGPzkiSNwKgvFkuSRswgkKSOMwgkqeMMAknqOINAkjrOIJCkjjMIJKnjDAJJ6jiDQJI6ziCQpI4zCCSp4wwCSeo4g0CSOs4gkKSOMwgkqeMMAknqOINAkjrOIJCkjjMIJKnjWg2CJCuS3JRke5JzB+x/RZKpJNc2r1e1WY8kaW+tTV6fZAGwFjgN2AlsSjJRVVunNf1EVa1pq45RuPWCZ47kuMecd8NIjivp4NbmGcFyYHtV7aiq+4H1wBktHk+S9BC0GQQLgdv61nc226Z7SZLrk1yaZPGgD0qyOslkksmpqak2apWkzhr1xeJPAUuq6lnAZcBHBzWqqnVVNV5V42NjY3NaoCQd6toMgl1A/2/4i5ptD6qqu6rqR83qh4HntFiPJGmANoNgE7AsydIkRwCrgIn+Bkme1Le6EtjWYj2SpAFau2uoqnYnWQNsBBYAF1XVliQXAJNVNQG8NslKYDfwXeAVbdUjSRqstSAAqKoNwIZp287rW34b8LY2a5Ak7d+oLxZLkkbMIJCkjjMIJKnjDAJJ6jiDQJI6ziCQpI4zCCSp4wwCSeo4g0CSOs4gkKSO2+8QE0nuBWpf+6vqsbNekSRpTu03CKrqKIAk7wBuB/4SCPBS4En7eask6SAxbNfQyqr6UFXdW1X3VNWf4LSTknRIGDYIfpDkpUkWJDksyUuBH7RZmCRpbgwbBP8R+A3gO83rrGabJOkgN9R8BFV1C3YFSdIhaagzgiRPTfL5JDc2689K8rvtliZJmgvDdg39Gb2ZxH4MUFXX05uDWJJ0kBs2CB5dVVdP27Z7touRJM29YYPgziRPpnm4LMmZ9J4rkCQd5IYNgnOAC4Fjk+wCXg+8eqY3JVmR5KYk25Ocu592L0lSScaHrEeSNEuGumsI+FZVnZrkSOCwqrp3pjckWQCsBU4DdgKbkkxU1dZp7Y4CXgd87cBKlyTNhmHPCG5Osg44EbhvyPcsB7ZX1Y6quh9Yz+BbUN8BvAv45yE/V5I0i4YNgmOB/02vi+jmJB9M8sszvGchcFvf+s5m24OSPBtYXFWf2d8HJVmdZDLJ5NTU1JAlS5KGMVQQVNUPq+qSqvp14ATgscCVD+fASQ4D3ge8aYjjr6uq8aoaHxsbeziHlSRNM/R8BEmel+RDwGbgkfSGnNifXcDivvVFzbY9jgKeAXwhyS30up0mvGAsSXNrqIvFzRf114FLgLdU1TADzm0CliVZSi8AVtE3PlFVfR84uu8YXwDeXFWTwxYvSXr4hr1r6FlVdc+BfHBV7U6yBtgILAAuqqotSS4AJqtq4gBrlaT9Ov/88zt13Nky0wxlb62qdwPvTLLXTGVV9dr9vb+qNgAbpm07bx9tnz9jtZKkWTfTGcG25k+7ayTpEDXTVJWfahZvqKpr5qAeSdIcG/auoT9Msi3JO5I8o9WKJElzatjnCF4AvACYAi5McoPzEUjSoWHo5wiq6ttV9Uf0Bpu7Fhh40VeSdHAZdoaypyc5P8kNwB8DX6b3gJgk6SA37HMEF9EbNO5XquqfWqxHkjTHZgyCZjjpm6vqA3NQjyRpjs3YNVRVDwCLkxwxB/VIkubYsF1DNwNXJZkAHhxnqKre10pVkqQ5M2wQfLN5HUZv1FBJ0iFiqCCoqt9vuxBJ0mgMOwz1FcCgQedeOOsVSZLm1LBdQ2/uW34k8BJg9+yXI43eB9/0qZkbzbI1f/gf5vyY0h7Ddg1tnrbpqiRXt1CPJGmODds19C/6Vg8DxoHHtVKRJGlODds1tJmfXiPYDdwCvLKNgiRJc2umGcqeC9xWVUub9ZfTuz5wC7C19eokSa2b6cniC4H7AZKcDPwB8FHg+8C6dkuTJM2FmYJgQVV9t1n+TWBdVX2yqn4PeMpMH55kRZKbkmxPcu6A/a9u5ja4NsmXkhx34D+CJOnhmDEIkuzpPjoFuLxv30zdSguAtcDpwHHA2QO+6D9eVc+squOBdwMOWSFJc2ymi8UXA1cmuRP4v8D/AUjyFHrdQ/uzHNheVTua96wHzqDv2kJV3dPX/kgGPLQmSWrXTJPXvzPJ54EnAZ+rqj1f1IcBr5nhsxcCt/Wt7wR+YXqjJOcAbwSOAAY+qZxkNbAa4JhjjpnhsJKkAzHMMNRfraq/rar+UUe/UVXXzEYBVbW2qp4M/A4wcB7kqlpXVeNVNT42NjYbh5UkNYaes/gh2AUs7ltf1Gzbl/XAr7VYjyRpgDaDYBOwLMnSZlKbVcBEf4Mky/pWXwT8Y4v1SJIGGPbJ4gNWVbuTrAE2AguAi6pqS5ILgMmqmgDWJDkV+DFwN/DytuqRJA3WWhAAVNUGYMO0bef1Lb+uzeNLkmbWZteQJOkgYBBIUscZBJLUca1eI5gLz3nL/xzJcTe/5z+N5LiSNNs8I5CkjjMIJKnjDAJJ6jiDQJI6ziCQpI4zCCSp4wwCSeo4g0CSOs4gkKSOMwgkqeMMAknqOINAkjrOIJCkjjMIJKnjDAJJ6jiDQJI6rtWJaZKsAD4ALAA+XFX/fdr+NwKvAnYDU8BvV9W32qxJOhi982VnjuS4b/+rS/e5b9s7L5/DSn7q6W9/4UiOeyhr7YwgyQJgLXA6cBxwdpLjpjX7OjBeVc8CLgXe3VY9kqTB2uwaWg5sr6odVXU/sB44o79BVV1RVT9sVr8KLGqxHknSAG0GwULgtr71nc22fXkl8NlBO5KsTjKZZHJqamoWS5QkzYuLxUleBowD7xm0v6rWVdV4VY2PjY3NbXGSdIhr82LxLmBx3/qiZtv/J8mpwNuB51XVj1qsR5I0QJtnBJuAZUmWJjkCWAVM9DdIcgJwIbCyqu5osRZJ0j60FgRVtRtYA2wEtgGXVNWWJBckWdk0ew/wGOCvk1ybZGIfHydJakmrzxFU1QZgw7Rt5/Utn9rm8SVJM5sXF4slSaNjEEhSxxkEktRxBoEkdZxBIEkdZxBIUscZBJLUcQaBJHWcQSBJHWcQSFLHGQSS1HEGgSR1nEEgSR1nEEhSxxkEktRxBoEkdZxBIEkdZxBIUscZBJLUca0GQZIVSW5Ksj3JuQP2n5zkmiS7k5zZZi2SpMFaC4IkC4C1wOnAccDZSY6b1uxW4BXAx9uqQ5K0f4e3+NnLge1VtQMgyXrgDGDrngZVdUuz7yct1iFJ2o82u4YWArf1re9sth2wJKuTTCaZnJqampXiJEk9B8XF4qpaV1XjVTU+NjY26nIk6ZDSZhDsAhb3rS9qtkmS5pE2g2ATsCzJ0iRHAKuAiRaPJ0l6CFoLgqraDawBNgLbgEuqakuSC5KsBEjy3CQ7gbOAC5NsaaseSdJgbd41RFVtADZM23Ze3/Imel1GkqQROSguFkuS2mMQSFLHGQSS1HEGgSR1nEEgSR1nEEhSxxkEktRxBoEkdZxBIEkdZxBIUscZBJLUcQaBJHWcQSBJHWcQSFLHGQSS1HEGgSR1nEEgSR1nEEhSxxkEktRxBoEkdVyrQZBkRZKbkmxPcu6A/T+T5BPN/q8lWdJmPZKkvbUWBEkWAGuB04HjgLOTHDet2SuBu6vqKcD7gXe1VY8kabA2zwiWA9urakdV3Q+sB86Y1uYM4KPN8qXAKUnSYk2SpGlSVe18cHImsKKqXtWs/xbwC1W1pq/NjU2bnc36N5s2d077rNXA6mb1acBNs1Tm0cCdM7aaW9Y0HGsa3nysy5qGM5s1/VxVjQ3acfgsHaBVVbUOWDfbn5tksqrGZ/tzHw5rGo41DW8+1mVNw5mrmtrsGtoFLO5bX9RsG9gmyeHA44C7WqxJkjRNm0GwCViWZGmSI4BVwMS0NhPAy5vlM4HLq62+KknSQK11DVXV7iRrgI3AAuCiqtqS5AJgsqomgD8H/jLJduC79MJiLs16d9MssKbhWNPw5mNd1jScOamptYvFkqSDg08WS1LHGQSS1HGdDIIkFyW5o3mOYeSSLE5yRZKtSbYked2oawJI8sgkVye5rqnr90dd0x5JFiT5epJPj7oWgCS3JLkhybVJJkddD0CSxye5NMk/JNmW5BfnQU1Pa/6O9rzuSfL6eVDXG5r/x29McnGSR46ghr2+l5Kc1dT1kySt3UbaySAAPgKsGHURfXYDb6qq44ATgXMGDMcxCj8CXlhVPw8cD6xIcuJoS3rQ64Btoy5imhdU1fHz6F70DwB/X1XHAj/PPPj7qqqbmr+j44HnAD8E/naUNSVZCLwWGK+qZ9C7uWWub1yBwd9LNwK/DnyxzQN3Mgiq6ov07lKaF6rq9qq6plm+l94/2IWjrQqq575m9RHNa+R3FyRZBLwI+PCoa5mvkjwOOJnenXlU1f1V9b2RFrW3U4BvVtW3Rl0IvTsoH9U8z/Ro4J/muoBB30tVta2qZmskhX3qZBDMZ80IrCcAXxtxKcCDXTDXAncAl1XVfKjrfwBvBX4y4jr6FfC5JJubIVFGbSkwBfxF04X24SRHjrqoaVYBF4+6iKraBbwXuBW4Hfh+VX1utFXNLYNgHknyGOCTwOur6p5R1wNQVQ80p/GLgOVJnjHKepL8e+COqto8yjoG+OWqeja90XbPSXLyiOs5HHg28CdVdQLwA2CvoeBHpXnIdCXw1/OglifQGwBzKfCzwJFJXjbaquaWQTBPJHkEvRD4WFX9zajrma7pVriC0V9bOQlYmeQWeiPavjDJX422pAd/q6Sq7qDX5718tBWxE9jZdwZ3Kb1gmC9OB66pqu+MuhDgVODmqpqqqh8DfwP80ohrmlMGwTzQDL3958C2qnrfqOvZI8lYksc3y48CTgP+YZQ1VdXbqmpRVS2h17VweVWN9Le3JEcmOWrPMvDv6F3kG5mq+jZwW5KnNZtOAbaOsKTpzmYedAs1bgVOTPLo5t/iKcyDC+tzqZNBkORi4CvA05LsTPLKEZd0EvBb9H673XNb3a+OuCaAJwFXJLme3thRl1XVvLhdc555IvClJNcBVwOfqaq/H3FNAK8BPtb89zse+G+jLaenCcvT6P3mPXLNWdOlwDXADfS+F+d8uIlB30tJXpxkJ/CLwGeSbGzl2A4xIUnd1skzAknSTxkEktRxBoEkdZxBIEkdZxBIUscZBDokJfmXfbfifjvJrmb5viQfmuG99+1v/7S2z0/yS9O2vSzJ9c2okdc1wzs8/iH+KFLrWpuqUhqlqrqL3r3zJDkfuK+q3tvCoZ4P3Ad8uTnWCuANwOlVtSvJAnrzcj8R+F7/G5MsqKoHWqhJOiCeEahTmt/gP90sPybJXzTzCFyf5CXT2h6d5CtJXtQ8Zf3JJJua10nNAIGvBt7QnG38W+DtwJv7hpx4oKou2jOCZDNvwbuSXAOcleTs5vg3JnlX37Hv61s+M8lHmuWPJPnTJJNJvtGMvSQ9LJ4RqMt+j95Ik8+EBwcfo1l+IjAB/G5VXZbk48D7q+pLSY4BNlbV05P8KX1nG0n+Db0nVPfnrqp6dpKfBb5Kb1z+u+mNXvprVfV3M7x/Cb2xjJ5M78nvp1TVPx/Yjy79lGcE6rJTgbV7Vqrq7mbxEcDngbdW1WV9bT/YDMk9ATy2GS12n5I8szlT+GaS3+zb9Ynmz+cCX2gGO9sNfIzeHAIzuaSqflJV/wjsAI4d4j3SPhkE0t52A5uBX+nbdhhw4p7ZtapqYd+kPf220IzyWVU3NEN4fxZ4VF+bHwxRQ//YL9OnTZw+LozjxOhhMQjUZZcB5+xZ6esaKuC3gWOT/E6z7XP0BnHb0/b4ZvFe4Ki+z/wD4L3NLGp79IdAv6uB5zXXIhbQG5Hzymbfd5I8PclhwIunve+sJIcleTLwr4HWZ7DSoc0gUJf9V+AJzYXa64AX7NnR3M1zNr0RYf8LzZy2zUXlrfQuEgN8CnjxnovFVbUB+CPgs0m2Jvky8ACw16iRVXU7vclirgCuAzZX1f9qdp8LfJre3Ui3T3vrrfRC5LPAq70+oIfL0Uelg0hz99Cnq+rSUdeiQ4dnBJLUcZ4RSFLHeUYgSR1nEEhSxxkEktRxBoEkdZxBIEkd9/8AvPkeKRczzswAAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "Ticket_Count = dict(data['Ticket'].value_counts())\n",
    "data['TicketGroup'] = data['Ticket'].apply(lambda x:Ticket_Count[x])\n",
    "sns.barplot(x='TicketGroup', y='Survived', data=data,ci=None)\n",
    "data['TicketGroup'].value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 301,
   "id": "5c9e8750",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYIAAAEGCAYAAABo25JHAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/Il7ecAAAACXBIWXMAAAsTAAALEwEAmpwYAAAT2UlEQVR4nO3df5BdZ33f8fdHMg6pMQmtt4xryUgFEeMCY2BxMnWKIbEbuaQyKSaRCylMSTVMERAIEDFJPEQ0kwESaBNEYoW4IWlAuKZJFyIqXDBOMRhrZfwDyRURsoOlknhtDNikiZH59o97hG/XV9orWWevpOf9mrmz5znnufd89/6xnz3POec5qSokSe1aMukCJEmTZRBIUuMMAklqnEEgSY0zCCSpcadMuoAjdcYZZ9SKFSsmXYYknVB27Nhxb1VNjdp2wgXBihUrmJ2dnXQZknRCSfKXh9rm0JAkNc4gkKTGGQSS1DiDQJIaZxBIUuMMAklqnEEgSY0zCCSpcQaBJDXuhLuzWNLRu/4FF066hGPuwj+/ftIlnPA8IpCkxhkEktQ4g0CSGtdrECRZnWR3kj1JNozY/t4kt3SvLyf5Rp/1SJIerbeTxUmWApuAi4F9wPYkM1W162CfqnrjUP/XAc/pqx5J0mh9HhGcD+ypqr1V9RCwBbj0MP0vBz7cYz2SpBH6DIKzgLuH2vu6dY+S5CnASuDTh9i+Lslsktm5ubljXqgktex4OVm8Frimqh4etbGqNlfVdFVNT02NfNKaJOko9RkE+4HlQ+1l3bpR1uKwkCRNRJ9BsB1YlWRlklMZ/LGfmd8pyTnAk4DP91iLJOkQeguCqjoArAe2AXcAV1fVziQbk6wZ6roW2FJV1VctkqRD63WuoaraCmydt+6Kee2391mDJOnwjpeTxZKkCTEIJKlxBoEkNc4gkKTGGQSS1DiDQJIaZxBIUuMMAklqnA+v10nvgt++YNIlHHM3vO6GSZegk4hHBJLUOINAkhpnEEhS4wwCSWqcQSBJjTMIJKlxBoEkNc4gkKTGGQSS1DiDQJIaZxBIUuN6DYIkq5PsTrInyYZD9PnpJLuS7EzyoT7rkSQ9Wm+TziVZCmwCLgb2AduTzFTVrqE+q4C3ARdU1f1J/mFf9UiSRuvziOB8YE9V7a2qh4AtwKXz+vw7YFNV3Q9QVff0WI8kaYQ+g+As4O6h9r5u3bCnA09PckOSG5OsHvVBSdYlmU0yOzc311O5ktSmSZ8sPgVYBbwQuBz4vSQ/OL9TVW2uqumqmp6amlrcCiXpJNdnEOwHlg+1l3Xrhu0DZqrqO1V1J/BlBsEgSVokfQbBdmBVkpVJTgXWAjPz+vwpg6MBkpzBYKhob481SZLm6S0IquoAsB7YBtwBXF1VO5NsTLKm67YNuC/JLuA64C1VdV9fNUmSHq3XZxZX1VZg67x1VwwtF/Cm7iVJmoBJnyyWJE2YQSBJjTMIJKlxBoEkNc4gkKTGGQSS1DiDQJIaZxBIUuMMAklqnEEgSY0zCCSpcQaBJDXOIJCkxhkEktQ4g0CSGmcQSFLjDAJJapxBIEmNMwgkqXG9BkGS1Ul2J9mTZMOI7a9KMpfklu71c33WI0l6tN4eXp9kKbAJuBjYB2xPMlNVu+Z1/UhVre+rjlZ9deOzJl3CMXf2FbdPugTppNTnEcH5wJ6q2ltVDwFbgEt73J8k6Sj0GQRnAXcPtfd16+Z7aZLbklyTZPmoD0qyLslsktm5ubk+apWkZk36ZPHHgBVV9WzgWuCDozpV1eaqmq6q6ampqUUtUJJOdn0GwX5g+D/8Zd2676mq+6rq77rmB4Dn9ViPJGmEPoNgO7AqycokpwJrgZnhDknOHGquAe7osR5J0gi9XTVUVQeSrAe2AUuBq6pqZ5KNwGxVzQCvT7IGOAB8HXhVX/VIkkbrLQgAqmorsHXeuiuGlt8GvK3PGiRJhzfpk8WSpAkzCCSpcQaBJDXOIJCkxhkEktQ4g0CSGmcQSFLjDAJJapxBIEmNMwgkqXGHnWIiyQNAHWp7VT3xmFckSVpUhw2CqjodIMk7gK8BfwQEeDlw5mHeKkk6QYw7NLSmqt5fVQ9U1beq6nfwsZOSdFIYNwi+neTlSZYmWZLk5cC3+yxMkrQ4xg2Cfw38NPDX3etl3TpJ0glurOcRVNVdOBQkSSelsY4Ikjw9yaeSfKlrPzvJL/dbmiRpMYw7NPR7DJ4k9h2AqrqNwTOIJUknuHGD4O9V1U3z1h041sVIkhbfuEFwb5Kn0t1cluQyBvcVSJJOcOMGwWuBK4FzkuwHfh54zUJvSrI6ye4ke5JsOEy/lyapJNNj1iNJOkbGumoI+MuquijJacCSqnpgoTckWQpsAi4G9gHbk8xU1a55/U4H3gB84chKlyQdC+MeEdyZZDPwI8CDY77nfGBPVe2tqoeALYy+BPUdwDuBvx3zcyVJx9C4QXAO8D8ZDBHdmeR9SX50gfecBdw91N7XrfueJM8FllfVnx3ug5KsSzKbZHZubm7MkiVJ4xgrCKrqb6rq6qr6V8BzgCcC1z+WHSdZArwH+IUx9r+5qqaranpqauqx7FaSNM/YzyNIcmGS9wM7gMczmHLicPYDy4fay7p1B50OPBP4TJK7GAw7zXjCWJIW11gni7s/1F8ErgbeUlXjTDi3HViVZCWDAFjL0PxEVfVN4IyhfXwGeHNVzY5bvCTpsRv3qqFnV9W3juSDq+pAkvXANmApcFVV7UyyEZitqpkjrFWS1IOFnlD21qp6F/BrSR71pLKqev3h3l9VW4Gt89ZdcYi+L1ywWknSMbfQEcEd3U+HayTpJLXQoyo/1i3eXlU3L0I9kqRFNu5VQ7+Z5I4k70jyzF4rkiQtqnHvI3gR8CJgDrgyye0+j0CSTg5j30dQVX9VVb/FYLK5W4CRJ30lSSeWcZ9Q9owkb09yO/DbwOcY3CAmSTrBjXsfwVUMJo37iar6Pz3WI0laZAsGQTed9J1V9Z8WoR5J0iJbcGioqh4Glic5dRHqkSQtsnGHhu4EbkgyA3xvnqGqek8vVUmSFs24QfCV7rWEwayhkqSTxFhBUFW/2nchkqTJGHca6uuAUZPO/dgxr0iStKjGHRp689Dy44GXAgeOfTmSpMU27tDQjnmrbkhyUw/1SJIW2bhDQ39/qLkEmAZ+oJeKJEmLatyhoR08co7gAHAX8Oo+CpIkLa6FnlD2fODuqlrZtV/J4PzAXcCu3quTJPVuoTuLrwQeAkjyAuDXgQ8C3wQ291uaJGkxLBQES6vq693yzwCbq+qjVfUrwNMW+vAkq5PsTrInyYYR21/TPdvgliSfTXLukf8KkqTHYsEgSHJw+OjHgU8PbVtoWGkpsAm4BDgXuHzEH/oPVdWzquo84F2AU1ZI0iJb6GTxh4Hrk9wL/F/gfwEkeRqD4aHDOR/YU1V7u/dsAS5l6NxCVX1rqP9pjLhpTZLUr4UeXv9rST4FnAl8sqoO/qFeArxugc8+C7h7qL0P+OH5nZK8FngTcCow8k7lJOuAdQBnn332AruVJB2JcaahvrGq/qSqhmcd/XJV3XwsCqiqTVX1VOAXgZHPQa6qzVU1XVXTU1NTx2K3kqTO2M8sPgr7geVD7WXdukPZArykx3okSSP0GQTbgVVJVnYPtVkLzAx3SLJqqPli4C96rEeSNMK4dxYfsao6kGQ9sA1YClxVVTuTbARmq2oGWJ/kIuA7wP3AK/uqR5I0Wm9BAFBVW4Gt89ZdMbT8hj73L0laWJ9DQ5KkE4BBIEmNMwgkqXG9niNYbM97yx9OuoRjbse7/82kS5B0kvOIQJIaZxBIUuMMAklqnEEgSY0zCCSpcSfVVUOSNK73/cLHJl3CMbf+N//lUb3PIwJJapxBIEmNMwgkqXEGgSQ1ziCQpMYZBJLUOINAkhpnEEhS4wwCSWqcQSBJjes1CJKsTrI7yZ4kG0Zsf1OSXUluS/KpJE/psx5J0qP1FgRJlgKbgEuAc4HLk5w7r9sXgemqejZwDfCuvuqRJI3W5xHB+cCeqtpbVQ8BW4BLhztU1XVV9Tdd80ZgWY/1SJJG6DMIzgLuHmrv69YdyquBT4zakGRdktkks3Nzc8ewREnScXGyOMkrgGng3aO2V9XmqpququmpqanFLU6STnJ9Po9gP7B8qL2sW/f/SXIR8EvAhVX1dz3WI0kaoc8jgu3AqiQrk5wKrAVmhjskeQ5wJbCmqu7psRZJ0iH0FgRVdQBYD2wD7gCurqqdSTYmWdN1ezfwBOC/JrklycwhPk6S1JNeH1VZVVuBrfPWXTG0fFGf+5ckLey4OFksSZocg0CSGmcQSFLjDAJJapxBIEmNMwgkqXEGgSQ1ziCQpMYZBJLUOINAkhpnEEhS4wwCSWqcQSBJjTMIJKlxBoEkNc4gkKTGGQSS1DiDQJIaZxBIUuN6DYIkq5PsTrInyYYR21+Q5OYkB5Jc1mctkqTReguCJEuBTcAlwLnA5UnOndftq8CrgA/1VYck6fBO6fGzzwf2VNVegCRbgEuBXQc7VNVd3bbv9liHJOkw+hwaOgu4e6i9r1t3xJKsSzKbZHZubu6YFCdJGjghThZX1eaqmq6q6ampqUmXI0knlT6DYD+wfKi9rFsnSTqO9BkE24FVSVYmORVYC8z0uD9J0lHoLQiq6gCwHtgG3AFcXVU7k2xMsgYgyfOT7ANeBlyZZGdf9UiSRuvzqiGqaiuwdd66K4aWtzMYMpIkTcgJcbJYktQfg0CSGmcQSFLjDAJJapxBIEmNMwgkqXEGgSQ1ziCQpMYZBJLUOINAkhpnEEhS4wwCSWqcQSBJjTMIJKlxBoEkNc4gkKTGGQSS1DiDQJIaZxBIUuMMAklqXK9BkGR1kt1J9iTZMGL79yX5SLf9C0lW9FmPJOnReguCJEuBTcAlwLnA5UnOndft1cD9VfU04L3AO/uqR5I0Wp9HBOcDe6pqb1U9BGwBLp3X51Lgg93yNcCPJ0mPNUmS5klV9fPByWXA6qr6ua79s8APV9X6oT5f6vrs69pf6frcO++z1gHruuYPAbt7KfrInAHcu2CvNvhdDPg9PMLv4hHHy3fxlKqaGrXhlMWu5GhU1WZg86TrGJZktqqmJ13H8cDvYsDv4RF+F484Eb6LPoeG9gPLh9rLunUj+yQ5BfgB4L4ea5IkzdNnEGwHViVZmeRUYC0wM6/PDPDKbvky4NPV11iVJGmk3oaGqupAkvXANmApcFVV7UyyEZitqhng94E/SrIH+DqDsDhRHFdDVRPmdzHg9/AIv4tHHPffRW8niyVJJwbvLJakxhkEktQ4g+AIJLkqyT3d/Q9NS7I8yXVJdiXZmeQNk65pUpI8PslNSW7tvotfnXRNk5ZkaZIvJvn4pGuZpCR3Jbk9yS1JZiddz6F4juAIJHkB8CDwh1X1zEnXM0lJzgTOrKqbk5wO7ABeUlW7Jlzaouvuhj+tqh5M8jjgs8AbqurGCZc2MUneBEwDT6yqn5x0PZOS5C5gev5NsscbjwiOQFX9OYOrm5pXVV+rqpu75QeAO4CzJlvVZNTAg13zcd2r2f+wkiwDXgx8YNK1aDwGgR6zbtbY5wBfmHApE9MNhdwC3ANcW1XNfhfAfwTeCnx3wnUcDwr4ZJId3VQ5xyWDQI9JkicAHwV+vqq+Nel6JqWqHq6q8xjcQX9+kiaHDpP8JHBPVe2YdC3HiR+tqucymIX5td3w8nHHINBR68bDPwr8cVX9t0nXczyoqm8A1wGrJ1zKpFwArOnGxrcAP5bkv0y2pMmpqv3dz3uAP2EwK/NxxyDQUelOkP4+cEdVvWfS9UxSkqkkP9gtfz9wMfC/J1rUhFTV26pqWVWtYDBTwKer6hUTLmsikpzWXUhBktOAfw4cl1ccGgRHIMmHgc8DP5RkX5JXT7qmCboA+FkG//Hd0r3+xaSLmpAzgeuS3MZgjq1rq6rpyyYFwJOBzya5FbgJ+LOq+h8TrmkkLx+VpMZ5RCBJjTMIJKlxBoEkNc4gkKTGGQSS1DiDQCelJP9g6LLWv0qyv1t+MMn7F3jvg4fbPq/vC5P803nrXpHktm4m0luTfODgfQbS8ai3R1VKk1RV9wHnASR5O/BgVf1GD7t6IYMZaT/X7Ws18Ebgkqran2Qpg+dyPxn4xvAbkyytqod7qEk6Ih4RqCndf/Af75afkOQ/d/PF35bkpfP6npHk80le3N09/NEk27vXBd1ke68B3tgdbfwz4JeANw9NLfBwVV1VVbu7z7wryTuT3Ay8LMnl3f6/lOSdQ/t+cGj5siR/0C3/QZLfTTKb5Mvd3D7SY+IRgVr2K8A3q+pZAEmedHBDkicDM8AvV9W1ST4EvLeqPpvkbGBbVT0jye8ydLSR5J8ANy+w3/uq6rlJ/hFwI/A84H4Gs1S+pKr+dIH3r2AwZ81TGdzR/LSq+tsj+9WlR3hEoJZdBGw62Kiq+7vFxwGfAt5aVdcO9X1fN9X0DPDEbubVQ0ryrO5I4StJfmZo00e6n88HPlNVc1V1APhjYJzZKa+uqu9W1V8Ae4FzxniPdEgGgfRoBxg8ce0nhtYtAX6kqs7rXmcNPYxm2E7guQBVdXs3NfUngO8f6vPtMWoYnvvl8YfZNqotHRGDQC27FnjtwcbQ0FAB/xY4J8kvdus+CbxuqO953eIDwOlDn/nrwG90T+k6aDgEht0EXNidi1gKXA5c32376yTPSLIE+Kl573tZkiVJngr8Y2D3gr+pdBgGgVr2H4AndSdqbwVedHBDdzXP5QxmV/33wOuB6e6k8i4GJ4kBPgb81MGTxVW1Ffgt4BNJdiX5HPAwsG3+zqvqa8AGBs8vuBXYUVX/vdu8Afg4g6uRvjbvrV9lECKfAF7j+QE9Vs4+Kp1AuquHPl5V10y6Fp08PCKQpMZ5RCBJjfOIQJIaZxBIUuMMAklqnEEgSY0zCCSpcf8Pd7WAlYcf90AAAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "# 根据如上的分析，发现有5个及以上人团购票的情况下，生存率相仿且数量比较少，所以将其归结为一类\n",
    "def TicketGroup(s):\n",
    "    if (s>4):\n",
    "        return 5\n",
    "    elif (s<=4):\n",
    "        return s\n",
    "\n",
    "data['TicketGroup']=data['TicketGroup'].apply(TicketGroup)\n",
    "sns.barplot(x=\"TicketGroup\", y=\"Survived\", data=data,ci=None)\n",
    "del data['Ticket']"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "63bbf803",
   "metadata": {},
   "source": [
    "（3）Cabin字符型数据处理\n",
    "\n",
    "    通过对Ticket特征观察，发现其缺失率接近80%，所以选择直接将这一列特征删除处理"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 302,
   "id": "47f3cb83",
   "metadata": {},
   "outputs": [],
   "source": [
    "del data['Cabin']"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "546daa0f",
   "metadata": {},
   "source": [
    "（4）剩余字符型数据处理\n",
    "\n",
    "    剩余的字符型数据特征为：Sex、Embarked、Title\n",
    "    \n",
    "    因为这些都是类别型特征，所以直接选择对其进行onehot处理，同时注意到：\n",
    "    \n",
    "    1）Parch和SibSp两种族特征可以进行组合成为一个新特征，也要进行onehot处理，所以在此部分首先进行处理。\n",
    "    \n",
    "    2）TicketGroup也是分类特征，可以同时进行onehot处理。\n",
    "    \n",
    "    综上，需要进行onehot的特征有：Pclass、Sex、Embarked、Title、FamilySize、TicketGroup"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 303,
   "id": "160fe536",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1     788\n",
       "2     235\n",
       "3     159\n",
       "4      43\n",
       "6      25\n",
       "5      22\n",
       "7      16\n",
       "11     11\n",
       "8       8\n",
       "Name: FamilySize, dtype: int64"
      ]
     },
     "execution_count": 303,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYIAAAEGCAYAAABo25JHAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/Il7ecAAAACXBIWXMAAAsTAAALEwEAmpwYAAAVIElEQVR4nO3df7RdZX3n8feHIPUHiJ3FbReSQLI0illqQW9TWlygAtMw7YS2ahuqVTrajGsRtfVXcZwyisvOqm21nZq6TC3WsUiKWGdda6bRAaojUzQ3iGCShkZ+SKItF7TijxYIfOePs0NPb05yT+Duc26y36+1zuI8ez9n728uyfnc/ey9n52qQpLUXUeNuwBJ0ngZBJLUcQaBJHWcQSBJHWcQSFLHHT3uAg7VCSecUEuXLh13GZJ0WNm6des9VTUxaN1hFwRLly5lenp63GVI0mElyZ0HWufQkCR1nEEgSR1nEEhSxxkEktRxBoEkdZxBIEkdZxBIUscZBJLUcQaBJHXcYXdnsY4cnzvr7LHs9+zPf24s+5UWKo8IJKnjDAJJ6rhWgyDJqiQ7k+xKcsmA9e9LclPzujXJP7VZjyRpf62dI0iyCFgPnAfsBrYkmaqq7fv6VNVv9PV/HXB6W/VIkgZr84hgJbCrqm6rqgeAjcAFB+l/IXBli/VIkgZoMwhOAu7qa+9ulu0nySnAMuDaFuuRJA2wUE4WrwGurqqHBq1MsjbJdJLpmZmZEZcmSUe2NoNgD7Ckr724WTbIGg4yLFRVG6pqsqomJyYGPmlNkvQotRkEW4DlSZYlOYbel/3U7E5JTgV+GPjbFmuRJB1Aa0FQVXuBdcBmYAdwVVVtS3JZktV9XdcAG6uq2qpFknRgrU4xUVWbgE2zll06q/2ONmuQJB3cQjlZLEkaE4NAkjrOIJCkjjMIJKnjDAJJ6jiDQJI6ziCQpI4zCCSp4wwCSeo4g0CSOs4gkKSOMwgkqeMMAknqOINAkjrOIJCkjjMIJKnjDAJJ6jiDQJI6ziCQpI4zCCSp41oNgiSrkuxMsivJJQfo84tJtifZluRjbdYjSdrf0W1tOMkiYD1wHrAb2JJkqqq29/VZDrwNOLOqvp3kR9qqR5I0WJtHBCuBXVV1W1U9AGwELpjV59eA9VX1bYCqurvFeiRJA7QZBCcBd/W1dzfL+j0DeEaS65PckGTVoA0lWZtkOsn0zMxMS+VKUjeN+2Tx0cBy4IXAhcCfJHnK7E5VtaGqJqtqcmJiYrQVStIRrs0g2AMs6Wsvbpb12w1MVdWDVXU7cCu9YJAkjUibQbAFWJ5kWZJjgDXA1Kw+/4ve0QBJTqA3VHRbizVJkmZpLQiqai+wDtgM7ACuqqptSS5Lsrrpthm4N8l24DrgLVV1b1s1SZL219rlowBVtQnYNGvZpX3vC3hj85IkjUGrQaCF48w/OnMs+73+ddePZb+Shjfuq4YkSWNmEEhSxxkEktRxBoEkdZxBIEkdZxBIUscZBJLUcd5H0IKvX/acsez35EtvGct+JR3ePCKQpI4zCCSp4wwCSeo4g0CSOs4gkKSOMwgkqeMMAknqOINAkjrOIJCkjjMIJKnjDAJJ6rhWgyDJqiQ7k+xKcsmA9RclmUlyU/N6TZv1SJL219qkc0kWAeuB84DdwJYkU1W1fVbXv6iqdW3VIUk6uDaPCFYCu6rqtqp6ANgIXNDi/iRJj0KbQXAScFdfe3ezbLaXJLk5ydVJlgzaUJK1SaaTTM/MzLRRqyR11rhPFn8KWFpVzwU+C3xkUKeq2lBVk1U1OTExMdICJelI12YQ7AH6f8Nf3Cx7RFXdW1X3N80PAc9vsR5J0gBtBsEWYHmSZUmOAdYAU/0dkpzY11wN7GixHknSAK1dNVRVe5OsAzYDi4DLq2pbksuA6aqaAl6fZDWwF/gWcFFb9UiSBmv1mcVVtQnYNGvZpX3v3wa8rc0aJEkHN+6TxZKkMTMIJKnjDAJJ6jiDQJI6ziCQpI4zCCSp4wwCSeo4g0CSOs4gkKSOMwgkqeMOOsVEku8CdaD1VfXkea9IkjRSBw2CqjoOIMm7gG8CHwUCvBw48SAflSQdJoYdGlpdVX9cVd+tqvuq6gP42ElJOiIMGwTfT/LyJIuSHJXk5cD32yxMkjQawwbBLwO/CPxj83pZs0ySdJgb6nkEVXUHDgVJ0hFpqCOCJM9Ick2Srzbt5yb5r+2WJkkahWGHhv6E3pPEHgSoqpvpPYNYknSYGzYInlhVX5q1bO98FyNJGr1hg+CeJE+jubksyUvp3VdwUElWJdmZZFeSSw7S7yVJKsnkkPVIkubJsA+vvxjYAJyaZA9wO72byg4oySJgPXAesBvYkmSqqrbP6ncc8Abgi4dYuyRpHgx7RHBnVZ0LTACnVtULqurOOT6zEthVVbdV1QPARgZfefQu4HeAfxm2aEnS/Bk2CG5PsgE4A/jekJ85Cbirr727WfaIJM8DllTVpw+2oSRrk0wnmZ6ZmRly95KkYQwbBKcC/4feENHtSd6f5AWPZcdJjgLeC7xprr5VtaGqJqtqcmJi4rHsVpI0y1BBUFU/qKqrquoXgNOBJwOfm+Nje4Alfe3FzbJ9jgOeDfxNkjvoHW1MecJYkkZr6OcRJDk7yR8DW4HH05ty4mC2AMuTLEtyDL37Dqb2rayq71TVCVW1tKqWAjfQm9xu+lD/EJKkR2+oq4aa39i/DFwFvKWq5pxwrqr2JlkHbAYWAZdX1bYklwHTVTV18C1IkkZh2MtHn1tV9x3qxqtqE7Bp1rJLD9D3hYe6fUnSYzfXE8reWlXvAd6dZL8nlVXV61urTJI0EnMdEexo/uu4vSQdoeZ6VOWnmre3VNWNI6hHkjRiw1419PtJdiR5V5Jnt1qRJGmkhr2P4EXAi4AZ4INJbvF5BJJ0ZBj6PoKq+oeq+h/Aa4GbgIFX/0iSDi/DPqHsWUnekeQW4I+A/0fvTmFJ0mFu2PsILqc3e+hPV9U3WqxHkjRicwZB81yB26vqD0dQjyRpxOYcGqqqh4AlzXxBkqQjzLBDQ7cD1yeZAh6ZZ6iq3ttKVZKkkRk2CL7WvI6iN320JOkIMVQQVNU72y5EkjQew05DfR0waNK5F897RZIOCzvefe1Y9vust/u1M9+GHRp6c9/7xwMvAfbOfzmSpFEbdmho66xF1yf5Ugv1SJJGbNihoX/X1zwKmASOb6UiSdJIDTs0tJV/PUewF7gDeHUbBUmSRmuuJ5T9OHBXVS1r2q+id37gDmB769VJklo3153FHwQeAEhyFvDfgY8A3wE2tFuaJGkU5gqCRVX1reb9LwEbquoTVfVbwNPn2niSVUl2JtmV5JIB61/bPNvgpiRfSLLi0P8IkqTHYs4gSLJv+OgcoP/C4bmGlRYB64HzgRXAhQO+6D9WVc+pqtOA9wBOWSFJIzbXyeIrgc8luQf4Z+D/AiR5Or3hoYNZCeyqqtuaz2wELqDv3EJV3dfX/0kMuGlNktSuuR5e/+4k1wAnAp+pqn1f1EcBr5tj2ycBd/W1dwM/MbtTkouBNwLHAANvGUyyFlgLcPLJJ/+bdc9/y/+co4x2bP3dV45lv5I034aZhvqGqvpkVfXPOnprVd04HwVU1fqqehrwm8DA5yBX1YaqmqyqyYmJifnYrSSpMfQzix+FPcCSvvbiZtmBbAR+rsV6JEkDtBkEW4DlSZY1D7VZA0z1d0iyvK/5M8Dft1iPJGmAYe8sPmRVtTfJOmAzsAi4vKq2JbkMmK6qKWBdknOBB4FvA69qqx5J0mCtBQFAVW0CNs1admnf+ze0uX9J0tzaHBqSJB0GDAJJ6jiDQJI6ziCQpI4zCCSp4wwCSeo4g0CSOs4gkKSOMwgkqeMMAknqOINAkjrOIJCkjjMIJKnjDAJJ6jiDQJI6ziCQpI4zCCSp4wwCSeo4g0CSOq7VIEiyKsnOJLuSXDJg/RuTbE9yc5JrkpzSZj2SpP21FgRJFgHrgfOBFcCFSVbM6vZlYLKqngtcDbynrXokSYO1eUSwEthVVbdV1QPARuCC/g5VdV1V/aBp3gAsbrEeSdIAbQbBScBdfe3dzbIDeTXwv1usR5I0wNHjLgAgySuASeDsA6xfC6wFOPnkk0dYmbro/W/61Mj3ue73/+PI9ynt0+YRwR5gSV97cbPs30hyLvB2YHVV3T9oQ1W1oaomq2pyYmKilWIlqavaDIItwPIky5IcA6wBpvo7JDkd+CC9ELi7xVokSQfQWhBU1V5gHbAZ2AFcVVXbklyWZHXT7XeBY4GPJ7kpydQBNidJakmr5wiqahOwadayS/ven9vm/iVJc/POYknqOINAkjrOIJCkjjMIJKnjDAJJ6jiDQJI6ziCQpI4zCCSp4wwCSeo4g0CSOs4gkKSOMwgkqeMMAknquAXxhDJJB/fuV7x0LPt9+59fPZb9arQ8IpCkjjMIJKnjDAJJ6jiDQJI6ziCQpI4zCCSp41oNgiSrkuxMsivJJQPWn5XkxiR7k4zn+jhJ6rjWgiDJImA9cD6wArgwyYpZ3b4OXAR8rK06JEkH1+YNZSuBXVV1G0CSjcAFwPZ9Harqjmbdwy3WIUk6iDaHhk4C7upr726WHbIka5NMJ5memZmZl+IkST2HxcniqtpQVZNVNTkxMTHuciTpiNJmEOwBlvS1FzfLJEkLSJtBsAVYnmRZkmOANcBUi/uTJD0KrQVBVe0F1gGbgR3AVVW1LcllSVYDJPnxJLuBlwEfTLKtrXokSYO1Og11VW0CNs1admnf+y30howkSWNyWJwsliS1xyCQpI4zCCSp4wwCSeo4g0CSOs4gkKSOMwgkqeMMAknqOINAkjrOIJCkjjMIJKnjDAJJ6jiDQJI6ziCQpI4zCCSp4wwCSeo4g0CSOs4gkKSOMwgkqeMMAknquFaDIMmqJDuT7EpyyYD1P5TkL5r1X0yytM16JEn7ay0IkiwC1gPnAyuAC5OsmNXt1cC3q+rpwPuA32mrHknSYG0eEawEdlXVbVX1ALARuGBWnwuAjzTvrwbOSZIWa5IkzZKqamfDyUuBVVX1mqb9K8BPVNW6vj5fbfrsbtpfa/rcM2tba4G1TfOZwM55KvME4J45e42WNQ3Hmoa3EOuypuHMZ02nVNXEoBVHz9MOWlVVG4AN873dJNNVNTnf230srGk41jS8hViXNQ1nVDW1OTS0B1jS117cLBvYJ8nRwPHAvS3WJEmapc0g2AIsT7IsyTHAGmBqVp8p4FXN+5cC11ZbY1WSpIFaGxqqqr1J1gGbgUXA5VW1LcllwHRVTQF/Cnw0yS7gW/TCYpTmfbhpHljTcKxpeAuxLmsazkhqau1ksSTp8OCdxZLUcQaBJHVcJ4MgyeVJ7m7uYxi7JEuSXJdke5JtSd4w7poAkjw+yZeSfKWp653jrmmfJIuSfDnJX427FoAkdyS5JclNSabHXQ9AkqckuTrJ3yXZkeQnF0BNz2x+Rvte9yX59QVQ1280f8e/muTKJI8fQw37fS8leVlT18NJWruMtJNBAPwZsGrcRfTZC7ypqlYAZwAXD5iOYxzuB15cVT8GnAasSnLGeEt6xBuAHeMuYpYXVdVpC+ha9D8E/rqqTgV+jAXw86qqnc3P6DTg+cAPgE+Os6YkJwGvByar6tn0Lm4Z9YUrMPh76avALwCfb3PHnQyCqvo8vauUFoSq+mZV3di8/y69f7AnjbcqqJ7vNc3HNa+xX12QZDHwM8CHxl3LQpXkeOAselfmUVUPVNU/jbWo/Z0DfK2q7hx3IfSuoHxCcz/TE4FvjLqAQd9LVbWjquZrJoUD6mQQLGTNDKynA18ccynAI0MwNwF3A5+tqoVQ1x8AbwUeHnMd/Qr4TJKtzZQo47YMmAE+3AyhfSjJk8Zd1CxrgCvHXURV7QF+D/g68E3gO1X1mfFWNVoGwQKS5FjgE8CvV9V9464HoKoeag7jFwMrkzx7nPUk+Vng7qraOs46BnhBVT2P3my7Fyc5a8z1HA08D/hAVZ0OfB/Ybyr4cWluMl0NfHwB1PLD9CbAXAY8FXhSkleMt6rRMggWiCSPoxcCV1TVX467ntmaYYXrGP+5lTOB1UnuoDej7YuT/Pl4S3rkt0qq6m56Y94rx1sRu4HdfUdwV9MLhoXifODGqvrHcRcCnAvcXlUzVfUg8JfAT425ppEyCBaAZurtPwV2VNV7x13PPkkmkjylef8E4Dzg78ZZU1W9raoWV9VSekML11bVWH97S/KkJMftew/8e3on+camqv4BuCvJM5tF5wDbx1jSbBeyAIaFGl8HzkjyxObf4jksgBPro9TJIEhyJfC3wDOT7E7y6jGXdCbwK/R+u913Wd1/GHNNACcC1yW5md7cUZ+tqgVxueYC86PAF5J8BfgS8Omq+usx1wTwOuCK5v/facBvj7ecniYsz6P3m/fYNUdNVwM3ArfQ+14c+XQTg76Xkvx8kt3ATwKfTrK5lX07xYQkdVsnjwgkSf/KIJCkjjMIJKnjDAJJ6jiDQJI6ziBQJyR5aNasl0sf4/ZWJ7mkef+OJG+eo//PNlM9fKWZZfY/N8tfm+SVj6UW6bHy8lF1QpLvVdWxLW37HcD3qur3DrD+ccCdwMqq2p3kh4Clo5hMTBqGRwTqpCTHJrkmyY3NcwQuaJYvbebv/7Mktya5Ism5Sa5P8vdJVjb9Lkry/lnbfFqSG/vay5v2cfTm/rkXoKru3xcC+44mkjx11hHLQ0lOae7u/kSSLc3rzBH9iNQhrT28XlpgntDMogpwO/Ay4Oer6r4kJwA3JJlq1j+9Wf+f6N1R/cvAC+hNkvZfgJ8btIOq+lqS7yQ5rapuAn4V+HBVfavZ9p1JrgH+Criyqh7u++w36N39S5KLgbOr6s4kHwPeV1VfSHIysBl41rz8RKSGQaCu+OdmFlXgkeGa325mCX2Y3vMffrRZfXtV3dL02wZcU1WV5BZg6Rz7+RDwq0neCPwSzeRzVfWaJM+hN8HZm+lNsXDR7A83v/H/Gr3goem/ojcFDgBPTnJs33MipMfMIFBXvRyYAJ5fVQ82s5nuezzh/X39Hu5rP8zc/2Y+Afw34Fpga1Xdu29FEy63JPkovaOSi/o/mOREepMPru77oj8KOKOq/uWQ/nTSIfAcgbrqeHrPNXgwyYuAU+Zjo80X9mbgA8CH4ZHzES/s63YavZPHj2iOUD4O/GZV3dq36jP0Jo/b1++0+ahT6mcQqKuuACab4Z5XMr/Ta19B7+hh31OuArw1yc7mPMU72X9Y6KeASeCdfSeMn0rzLN0kNyfZDrx2HuuUAC8fleZdc0/B8VX1W+OuRRqG5wikeZTkk8DTgBePuxZpWB4RSFLHeY5AkjrOIJCkjjMIJKnjDAJJ6jiDQJI67v8D1UYDP9jPmCkAAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "# 1）Parch、SibSp特征组合    FamilySize=Parch+SibSp+1\n",
    "data['FamilySize']=data['SibSp']+data['Parch']+1\n",
    "sns.barplot(x=\"FamilySize\", y=\"Survived\", data=data,ci=None)\n",
    "data['FamilySize'].value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 304,
   "id": "aef9f8f1",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>PassengerId</th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Embarked</th>\n",
       "      <th>Title</th>\n",
       "      <th>TicketGroup</th>\n",
       "      <th>FamilySize</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>0.0</td>\n",
       "      <td>3</td>\n",
       "      <td>male</td>\n",
       "      <td>22.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>7.2500</td>\n",
       "      <td>S</td>\n",
       "      <td>Mr</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1</td>\n",
       "      <td>female</td>\n",
       "      <td>38.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>71.2833</td>\n",
       "      <td>C</td>\n",
       "      <td>Mrs</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>1.0</td>\n",
       "      <td>3</td>\n",
       "      <td>female</td>\n",
       "      <td>26.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>7.9250</td>\n",
       "      <td>S</td>\n",
       "      <td>Miss</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1</td>\n",
       "      <td>female</td>\n",
       "      <td>35.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>53.1000</td>\n",
       "      <td>S</td>\n",
       "      <td>Mrs</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>0.0</td>\n",
       "      <td>3</td>\n",
       "      <td>male</td>\n",
       "      <td>35.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>8.0500</td>\n",
       "      <td>S</td>\n",
       "      <td>Mr</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1304</th>\n",
       "      <td>1305</td>\n",
       "      <td>NaN</td>\n",
       "      <td>3</td>\n",
       "      <td>male</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>8.0500</td>\n",
       "      <td>S</td>\n",
       "      <td>Mr</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1305</th>\n",
       "      <td>1306</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1</td>\n",
       "      <td>female</td>\n",
       "      <td>39.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>108.9000</td>\n",
       "      <td>C</td>\n",
       "      <td>Royalty</td>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1306</th>\n",
       "      <td>1307</td>\n",
       "      <td>NaN</td>\n",
       "      <td>3</td>\n",
       "      <td>male</td>\n",
       "      <td>38.5</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>7.2500</td>\n",
       "      <td>S</td>\n",
       "      <td>Mr</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1307</th>\n",
       "      <td>1308</td>\n",
       "      <td>NaN</td>\n",
       "      <td>3</td>\n",
       "      <td>male</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>8.0500</td>\n",
       "      <td>S</td>\n",
       "      <td>Mr</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1308</th>\n",
       "      <td>1309</td>\n",
       "      <td>NaN</td>\n",
       "      <td>3</td>\n",
       "      <td>male</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>22.3583</td>\n",
       "      <td>C</td>\n",
       "      <td>Master</td>\n",
       "      <td>3</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>1307 rows × 12 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "      PassengerId  Survived  Pclass     Sex   Age  SibSp  Parch      Fare  \\\n",
       "0               1       0.0       3    male  22.0      1      0    7.2500   \n",
       "1               2       1.0       1  female  38.0      1      0   71.2833   \n",
       "2               3       1.0       3  female  26.0      0      0    7.9250   \n",
       "3               4       1.0       1  female  35.0      1      0   53.1000   \n",
       "4               5       0.0       3    male  35.0      0      0    8.0500   \n",
       "...           ...       ...     ...     ...   ...    ...    ...       ...   \n",
       "1304         1305       NaN       3    male   NaN      0      0    8.0500   \n",
       "1305         1306       NaN       1  female  39.0      0      0  108.9000   \n",
       "1306         1307       NaN       3    male  38.5      0      0    7.2500   \n",
       "1307         1308       NaN       3    male   NaN      0      0    8.0500   \n",
       "1308         1309       NaN       3    male   NaN      1      1   22.3583   \n",
       "\n",
       "     Embarked    Title  TicketGroup  FamilySize  \n",
       "0           S       Mr            1           2  \n",
       "1           C      Mrs            2           2  \n",
       "2           S     Miss            1           1  \n",
       "3           S      Mrs            2           2  \n",
       "4           S       Mr            1           1  \n",
       "...       ...      ...          ...         ...  \n",
       "1304        S       Mr            1           1  \n",
       "1305        C  Royalty            3           1  \n",
       "1306        S       Mr            1           1  \n",
       "1307        S       Mr            1           1  \n",
       "1308        C   Master            3           3  \n",
       "\n",
       "[1307 rows x 12 columns]"
      ]
     },
     "execution_count": 304,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYIAAAEGCAYAAABo25JHAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/Il7ecAAAACXBIWXMAAAsTAAALEwEAmpwYAAATuklEQVR4nO3df5BdZ33f8ffHMg4pNqQZbzOOJVsaECEeYEzYKJmawUDsVDRUogUSOyTgFKIyg4CWABFt44KYtBPSQNugZFDAKaGA4uCQWRKlgtouKW4MWhlhR1LkCNmOpaS1bAjG+YEt+9s/7pF9u77SXkl79kp63q+ZO7rPOc8997tnRvvZ85xznpOqQpLUrrMmXYAkabIMAklqnEEgSY0zCCSpcQaBJDXu7EkXcLzOP//8Wr58+aTLkKTTyo4dO+6vqqlR6067IFi+fDmzs7OTLkOSTitJ7jnaOoeGJKlxBoEkNc4gkKTGGQSS1DiDQJIaZxBIUuMMAklqnEEgSY0zCCSpcafdncWSTtwXXnz5pEtYcJf/0RcmXcJpzyMCSWqcQSBJjes1CJKsTrI3yb4kG0as/2CSnd3rziR/1Wc9kqQn6+0cQZIlwCbgSuAAsD3JTFXtPtKnqv7VUP+3AC/oqx5J0mh9HhGsAvZV1f6qehjYAqw9Rv+rgU/1WI8kaYQ+g+BC4N6h9oFu2ZMkuRhYAdzUYz2SpBFOlZPFVwGfrqpHR61Msi7JbJLZQ4cOLXJpknRm6zMIDgLLhtpLu2WjXMUxhoWqanNVTVfV9NTUyCetSZJOUJ9BsB1YmWRFknMY/LKfmdspyXOAvw/8cY+1SJKOorcgqKrDwHpgG7AHuL6qdiXZmGTNUNergC1VVX3VIkk6ul6nmKiqrcDWOcuundN+T581SJKO7VQ5WSxJmhCDQJIaZxBIUuMMAklqnEEgSY0zCCSpcQaBJDXOIJCkxhkEktQ4g0CSGmcQSFLjDAJJapxBIEmNMwgkqXEGgSQ1ziCQpMYZBJLUOINAkhpnEEhS4wwCSWpcr0GQZHWSvUn2JdlwlD4/nmR3kl1JPtlnPZKkJzu7rw0nWQJsAq4EDgDbk8xU1e6hPiuBdwOXVdU3kvyDvuqRJI3W5xHBKmBfVe2vqoeBLcDaOX1+FthUVd8AqKr7eqxHkjRCn0FwIXDvUPtAt2zYs4FnJ7klya1JVo/aUJJ1SWaTzB46dKinciWpTZM+WXw2sBJ4CXA18BtJvmtup6raXFXTVTU9NTW1uBVK0hmuzyA4CCwbai/tlg07AMxU1SNVdRdwJ4NgkCQtkj6DYDuwMsmKJOcAVwEzc/r8HoOjAZKcz2CoaH+PNUmS5ugtCKrqMLAe2AbsAa6vql1JNiZZ03XbBjyQZDdwM/DOqnqgr5okSU/W2+WjAFW1Fdg6Z9m1Q+8LeHv3kiRNQK9BIJ0KLvvVyyZdwoK75S23TLoEnUEmfdWQJGnCDAJJapxBIEmNMwgkqXEGgSQ1ziCQpMYZBJLUOO8jOEP9+cbnTbqEBXfRtXdMugTpjOQRgSQ1ziCQpMYZBJLUOINAkhpnEEhS4wwCSWqcQSBJjTMIJKlxBoEkNc4gkKTGGQSS1LhegyDJ6iR7k+xLsmHE+muSHEqys3u9sc96JElP1tukc0mWAJuAK4EDwPYkM1W1e07X366q9X3VIUk6tj6PCFYB+6pqf1U9DGwB1vb4fZKkE9BnEFwI3DvUPtAtm+tVSW5P8ukky0ZtKMm6JLNJZg8dOtRHrZLUrEmfLP4ssLyqng98HvjYqE5VtbmqpqtqempqalELlKQzXZ9BcBAY/gt/abfscVX1QFV9u2t+BHhhj/VIkkboMwi2AyuTrEhyDnAVMDPcIckFQ801wJ4e65EkjdDbVUNVdTjJemAbsAS4rqp2JdkIzFbVDPDWJGuAw8DXgWv6qkeSNFqvzyyuqq3A1jnLrh16/27g3X3WIEk6tkmfLJYkTZhBIEmNMwgkqXEGgSQ1ziCQpMYZBJLUOINAkhpnEEhS4wwCSWqcQSBJjTvmFBNJvgXU0dZX1dMXvCJJ0qI6ZhBU1XkASd4H/CXwcSDAa4ELjvFRSdJpYtyhoTVV9WtV9a2qerCqfh0fOylJZ4Rxg+Cvk7w2yZIkZyV5LfDXfRYmSVoc4wbBTwI/Dvzf7vWabpkk6TQ31vMIqupuHAqSpDPSWEcESZ6d5MYkf9K1n5/k3/ZbmiRpMYw7NPQbDJ4k9ghAVd3O4BnEkqTT3LhB8Peq6stzlh1e6GIkSYtv3CC4P8kz6W4uS/JqBvcVHFOS1Un2JtmXZMMx+r0qSSWZHrMeSdICGffh9W8GNgPPSXIQuIvBTWVHlWQJsAm4EjgAbE8yU1W75/Q7D3gb8KXjrF2StADGPSK4p6quAKaA51TVi6rqnnk+swrYV1X7q+phYAujrzx6H/BLwN+NW7QkaeGMGwR3JdkM/DDw0JifuRC4d6h9oFv2uCQ/ACyrqj841oaSrEsym2T20KFDY369JGkc4wbBc4D/wWCI6K4kH0ryopP54iRnAR8Afm6+vlW1uaqmq2p6amrqZL5WkjTHWEFQVX9TVddX1T8DXgA8HfjCPB87CCwbai/tlh1xHvBc4H8muZvB0caMJ4wlaXGN/TyCJJcn+TVgB/BUBlNOHMt2YGWSFUnOYXDfwcyRlVX1zao6v6qWV9Vy4FYGk9vNHu8PIUk6cWNdNdT9xf4V4HrgnVU174RzVXU4yXpgG7AEuK6qdiXZCMxW1cyxtyBJWgzjXj76/Kp68Hg3XlVbga1zll17lL4vOd7tS5JO3nxPKHtXVb0f+MUkT3pSWVW9tbfKJEmLYr4jgj3dv47bS9IZar5HVX62e3tHVd22CPVIkhbZuFcN/UqSPUnel+S5vVYkSVpU495H8FLgpcAh4MNJ7vB5BJJ0Zhj7PoKq+j9V9V+ANwE7gZFX/0iSTi/jPqHs+5O8J8kdwK8C/5vBncKSpNPcuPcRXMdg9tB/VFV/0WM9kqRFNm8QdM8VuKuq/vMi1CNJWmTzDg1V1aPAsm6+IEnSGWbcoaG7gFuSzACPzzNUVR/opSpJ0qIZNwi+1r3OYjB9tCTpDDFWEFTVe/suRJI0GeNOQ30zMGrSuZcteEWSpEU17tDQO4bePxV4FXB44cuRJC22cYeGdsxZdEuSL/dQjyRpkY07NPTdQ82zgGngGb1UJElaVOMODe3giXMEh4G7gTf0UZAkaXHN94SyHwTuraoVXfv1DM4P3A3s7r06SVLv5ruz+MPAwwBJXgz8B+BjwDeBzf2WJklaDPMFwZKq+nr3/ieAzVV1Q1X9AvCs+TaeZHWSvUn2JdkwYv2bumcb7EzyxSSXHP+PIEk6GfMGQZIjw0c/Atw0tG6+YaUlwCbg5cAlwNUjftF/sqqeV1WXAu8HnLJCkhbZfCeLPwV8Icn9wN8C/wsgybMYDA8dyypgX1Xt7z6zBVjL0LmFqnpwqP/TGHHTmiSpX/M9vP4Xk9wIXAB8rqqO/KI+C3jLPNu+ELh3qH0A+KG5nZK8GXg7cA4w8k7lJOuAdQAXXXTRUb/whe/8rXlKOv3s+OXXTboESWe4caahvrWqPlNVw7OO3llVty1EAVW1qaqeCfw8MPI5yFW1uaqmq2p6ampqIb5WktQZ+5nFJ+AgsGyovbRbdjRbgFf2WI8kaYQ+g2A7sDLJiu6hNlcBM8Mdkqwcav4Y8Gc91iNJGmHcO4uPW1UdTrIe2AYsAa6rql1JNgKzVTUDrE9yBfAI8A3g9X3VI0karbcgAKiqrcDWOcuuHXr/tj6/X5I0vz6HhiRJpwGDQJIaZxBIUuMMAklqnEEgSY0zCCSpcQaBJDXOIJCkxhkEktQ4g0CSGmcQSFLjDAJJapxBIEmNMwgkqXEGgSQ1ziCQpMYZBJLUOINAkhpnEEhS43oNgiSrk+xNsi/JhhHr355kd5Lbk9yY5OI+65EkPVlvQZBkCbAJeDlwCXB1kkvmdPsKMF1Vzwc+Dby/r3okSaP1eUSwCthXVfur6mFgC7B2uENV3VxVf9M1bwWW9liPJGmEPoPgQuDeofaBbtnRvAH4wx7rkSSNcPakCwBI8lPANHD5UdavA9YBXHTRRYtYmSSd+fo8IjgILBtqL+2W/X+SXAH8G2BNVX171IaqanNVTVfV9NTUVC/FSlKr+gyC7cDKJCuSnANcBcwMd0jyAuDDDELgvh5rkSQdRW9BUFWHgfXANmAPcH1V7UqyMcmartsvA+cCv5NkZ5KZo2xOktSTXs8RVNVWYOucZdcOvb+iz++XJM3PO4slqXEGgSQ1ziCQpMYZBJLUOINAkhpnEEhS406JKSYkabF96Oc+O+kSFtz6X/knJ/Q5jwgkqXEGgSQ1ziCQpMYZBJLUOINAkhpnEEhS4wwCSWqcQSBJjTMIJKlxBoEkNc4gkKTGGQSS1DiDQJIa12sQJFmdZG+SfUk2jFj/4iS3JTmc5NV91iJJGq23IEiyBNgEvBy4BLg6ySVzuv05cA3wyb7qkCQdW5/PI1gF7Kuq/QBJtgBrgd1HOlTV3d26x3qsQ5J0DH0ODV0I3DvUPtAtO25J1iWZTTJ76NChBSlOkjRwWpwsrqrNVTVdVdNTU1OTLkeSzih9BsFBYNlQe2m3TJJ0CukzCLYDK5OsSHIOcBUw0+P3SZJOQG9BUFWHgfXANmAPcH1V7UqyMckagCQ/mOQA8Brgw0l29VWPJGm0Pq8aoqq2AlvnLLt26P12BkNGkqQJOS1OFkuS+mMQSFLjDAJJapxBIEmNMwgkqXEGgSQ1ziCQpMYZBJLUOINAkhpnEEhS4wwCSWqcQSBJjTMIJKlxBoEkNc4gkKTGGQSS1DiDQJIaZxBIUuMMAklqnEEgSY3rNQiSrE6yN8m+JBtGrP+OJL/drf9SkuV91iNJerLegiDJEmAT8HLgEuDqJJfM6fYG4BtV9Szgg8Av9VWPJGm0Po8IVgH7qmp/VT0MbAHWzumzFvhY9/7TwI8kSY81SZLmSFX1s+Hk1cDqqnpj1/5p4Ieqav1Qnz/p+hzo2l/r+tw/Z1vrgHVd8/uAvb0UfXzOB+6ft1cb3BcD7ocnuC+ecKrsi4uramrUirMXu5ITUVWbgc2TrmNYktmqmp50HacC98WA++EJ7osnnA77os+hoYPAsqH20m7ZyD5JzgaeATzQY02SpDn6DILtwMokK5KcA1wFzMzpMwO8vnv/auCm6musSpI0Um9DQ1V1OMl6YBuwBLiuqnYl2QjMVtUM8FHg40n2AV9nEBani1NqqGrC3BcD7ocnuC+ecMrvi95OFkuSTg/eWSxJjTMIJKlxBsFxSHJdkvu6+x+almRZkpuT7E6yK8nbJl3TpCR5apIvJ/lqty/eO+maJi3JkiRfSfL7k65lkpLcneSOJDuTzE66nqPxHMFxSPJi4CHgt6rquZOuZ5KSXABcUFW3JTkP2AG8sqp2T7i0RdfdDf+0qnooyVOALwJvq6pbJ1zaxCR5OzANPL2qXjHpeiYlyd3A9NybZE81HhEch6r6IwZXNzWvqv6yqm7r3n8L2ANcONmqJqMGHuqaT+lezf6FlWQp8GPARyZdi8ZjEOikdbPGvgD40oRLmZhuKGQncB/w+apqdl8A/wl4F/DYhOs4FRTwuSQ7uqlyTkkGgU5KknOBG4B/WVUPTrqeSamqR6vqUgZ30K9K0uTQYZJXAPdV1Y5J13KKeFFV/QCDWZjf3A0vn3IMAp2wbjz8BuATVfW7k67nVFBVfwXcDKyecCmTchmwphsb3wK8LMl/m2xJk1NVB7t/7wM+w2BW5lOOQaAT0p0g/Siwp6o+MOl6JinJVJLv6t5/J3Al8KcTLWpCqurdVbW0qpYzmCngpqr6qQmXNRFJntZdSEGSpwE/CpySVxwaBMchyaeAPwa+L8mBJG+YdE0TdBnw0wz+4tvZvf7xpIuakAuAm5PczmCOrc9XVdOXTQqA7wG+mOSrwJeBP6iq/z7hmkby8lFJapxHBJLUOINAkhpnEEhS4wwCSWqcQSBJjTMI1IQkjw5d5rqzmxbjZLa3JsmG7v17krxjnv6v6Gbj/Go3Y+u/6Ja/KcnrTqYW6WR5+aiakOShqjq3p22/B3ioqv7jUdY/BbgHWFVVB5J8B7C8qvb2UY90vDwiUJOSnJvkxiS3dfPFr+2WL0/yp0n+a5I7k3wiyRVJbknyZ0lWdf2uSfKhOdt8ZpLbhtoru/Z5DJ4P/gBAVX37SAgcOZpI8r1zjlgeTXJxd9fyDUm2d6/LFmkXqSG9PbxeOsV8Zzc7KMBdwGuAf1pVDyY5H7g1yUy3/lnd+n/O4E7hnwReBKwB/jXwylFfUFVfS/LNJJdW1U7gZ4DfrKqvd9u+J8mNwO8Dn6qqx4Y++xfApQBJ3gxcXlX3JPkk8MGq+mKSi4BtwPcvyB6ROgaBWvG33eygwOPDNf++mw3yMQbPUviebvVdVXVH128XcGNVVZI7gOXzfM9HgJ/pHszyE3STjFXVG5M8D7gCeAeD+Yiumfvh7i/+n2UQPHT9LxlM7QTA05OcO/T8A+mkGQRq1WuBKeCFVfVIN1vmU7t13x7q99hQ+zHm/z9zA/DvgJuAHVX1wJEVXbjckeTjDI5Krhn+YPfUt48Ca4Z+0Z8F/HBV/d1x/XTScfAcgVr1DAbz5j+S5KXAxQux0e4X9jbg14HfhMfPR7xkqNulDE4eP647Qvkd4Oer6s6hVZ8D3jLU79KFqFMaZhCoVZ8AprvhntexsNNGf4LB0cPnunaAdyXZ252neC9PHhb6hwye8fveoRPG3wu8tavz9iS7gTctYJ0S4OWj0oLr7il4RlX9wqRrkcbhOQJpASX5DPBM4GWTrkUal0cEktQ4zxFIUuMMAklqnEEgSY0zCCSpcQaBJDXu/wE4f28lfLnksQAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "# 注意到5人及以上人口的家庭生存率相仿，且样本数量较少，所以也采用TicketGroup相同的处理方式，将大于5个人口的家庭归为一类\n",
    "data['FamilySize']=data['FamilySize'].apply(TicketGroup)\n",
    "sns.barplot(x=\"FamilySize\", y=\"Survived\", data=data,ci=None)\n",
    "data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 305,
   "id": "c63998e7",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>PassengerId</th>\n",
       "      <th>Survived</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Pclass_1</th>\n",
       "      <th>Pclass_2</th>\n",
       "      <th>Pclass_3</th>\n",
       "      <th>Sex_female</th>\n",
       "      <th>...</th>\n",
       "      <th>FamilySize_1</th>\n",
       "      <th>FamilySize_2</th>\n",
       "      <th>FamilySize_3</th>\n",
       "      <th>FamilySize_4</th>\n",
       "      <th>FamilySize_5</th>\n",
       "      <th>TicketGroup_1</th>\n",
       "      <th>TicketGroup_2</th>\n",
       "      <th>TicketGroup_3</th>\n",
       "      <th>TicketGroup_4</th>\n",
       "      <th>TicketGroup_5</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>0.0</td>\n",
       "      <td>22.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>7.2500</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>1.0</td>\n",
       "      <td>38.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>71.2833</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>1.0</td>\n",
       "      <td>26.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>7.9250</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>...</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>1.0</td>\n",
       "      <td>35.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>53.1000</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>0.0</td>\n",
       "      <td>35.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>8.0500</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1304</th>\n",
       "      <td>1305</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>8.0500</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1305</th>\n",
       "      <td>1306</td>\n",
       "      <td>NaN</td>\n",
       "      <td>39.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>108.9000</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>...</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1306</th>\n",
       "      <td>1307</td>\n",
       "      <td>NaN</td>\n",
       "      <td>38.5</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>7.2500</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1307</th>\n",
       "      <td>1308</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>8.0500</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1308</th>\n",
       "      <td>1309</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>22.3583</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>1307 rows × 30 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "      PassengerId  Survived   Age  SibSp  Parch      Fare  Pclass_1  Pclass_2  \\\n",
       "0               1       0.0  22.0      1      0    7.2500         0         0   \n",
       "1               2       1.0  38.0      1      0   71.2833         1         0   \n",
       "2               3       1.0  26.0      0      0    7.9250         0         0   \n",
       "3               4       1.0  35.0      1      0   53.1000         1         0   \n",
       "4               5       0.0  35.0      0      0    8.0500         0         0   \n",
       "...           ...       ...   ...    ...    ...       ...       ...       ...   \n",
       "1304         1305       NaN   NaN      0      0    8.0500         0         0   \n",
       "1305         1306       NaN  39.0      0      0  108.9000         1         0   \n",
       "1306         1307       NaN  38.5      0      0    7.2500         0         0   \n",
       "1307         1308       NaN   NaN      0      0    8.0500         0         0   \n",
       "1308         1309       NaN   NaN      1      1   22.3583         0         0   \n",
       "\n",
       "      Pclass_3  Sex_female  ...  FamilySize_1  FamilySize_2  FamilySize_3  \\\n",
       "0            1           0  ...             0             1             0   \n",
       "1            0           1  ...             0             1             0   \n",
       "2            1           1  ...             1             0             0   \n",
       "3            0           1  ...             0             1             0   \n",
       "4            1           0  ...             1             0             0   \n",
       "...        ...         ...  ...           ...           ...           ...   \n",
       "1304         1           0  ...             1             0             0   \n",
       "1305         0           1  ...             1             0             0   \n",
       "1306         1           0  ...             1             0             0   \n",
       "1307         1           0  ...             1             0             0   \n",
       "1308         1           0  ...             0             0             1   \n",
       "\n",
       "      FamilySize_4  FamilySize_5  TicketGroup_1  TicketGroup_2  TicketGroup_3  \\\n",
       "0                0             0              1              0              0   \n",
       "1                0             0              0              1              0   \n",
       "2                0             0              1              0              0   \n",
       "3                0             0              0              1              0   \n",
       "4                0             0              1              0              0   \n",
       "...            ...           ...            ...            ...            ...   \n",
       "1304             0             0              1              0              0   \n",
       "1305             0             0              0              0              1   \n",
       "1306             0             0              1              0              0   \n",
       "1307             0             0              1              0              0   \n",
       "1308             0             0              0              0              1   \n",
       "\n",
       "      TicketGroup_4  TicketGroup_5  \n",
       "0                 0              0  \n",
       "1                 0              0  \n",
       "2                 0              0  \n",
       "3                 0              0  \n",
       "4                 0              0  \n",
       "...             ...            ...  \n",
       "1304              0              0  \n",
       "1305              0              0  \n",
       "1306              0              0  \n",
       "1307              0              0  \n",
       "1308              0              0  \n",
       "\n",
       "[1307 rows x 30 columns]"
      ]
     },
     "execution_count": 305,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 2）OneHotEncoder\n",
    "categoryList = ['Pclass','Sex','Embarked','Title','FamilySize','TicketGroup']\n",
    "for j in range(len(categoryList)):\n",
    "    oneHot = pd.get_dummies(data[categoryList[j]], prefix=categoryList[j], prefix_sep='_')\n",
    "    data = pd.concat([data, oneHot], axis=1)\n",
    "    del data[categoryList[j]]\n",
    "\n",
    "data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 306,
   "id": "a8362df8",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "Int64Index: 1307 entries, 0 to 1308\n",
      "Data columns (total 30 columns):\n",
      " #   Column         Non-Null Count  Dtype  \n",
      "---  ------         --------------  -----  \n",
      " 0   PassengerId    1307 non-null   int64  \n",
      " 1   Survived       889 non-null    float64\n",
      " 2   Age            1044 non-null   float64\n",
      " 3   SibSp          1307 non-null   int64  \n",
      " 4   Parch          1307 non-null   int64  \n",
      " 5   Fare           1306 non-null   float64\n",
      " 6   Pclass_1       1307 non-null   uint8  \n",
      " 7   Pclass_2       1307 non-null   uint8  \n",
      " 8   Pclass_3       1307 non-null   uint8  \n",
      " 9   Sex_female     1307 non-null   uint8  \n",
      " 10  Sex_male       1307 non-null   uint8  \n",
      " 11  Embarked_C     1307 non-null   uint8  \n",
      " 12  Embarked_Q     1307 non-null   uint8  \n",
      " 13  Embarked_S     1307 non-null   uint8  \n",
      " 14  Title_Master   1307 non-null   uint8  \n",
      " 15  Title_Miss     1307 non-null   uint8  \n",
      " 16  Title_Mr       1307 non-null   uint8  \n",
      " 17  Title_Mrs      1307 non-null   uint8  \n",
      " 18  Title_Officer  1307 non-null   uint8  \n",
      " 19  Title_Royalty  1307 non-null   uint8  \n",
      " 20  FamilySize_1   1307 non-null   uint8  \n",
      " 21  FamilySize_2   1307 non-null   uint8  \n",
      " 22  FamilySize_3   1307 non-null   uint8  \n",
      " 23  FamilySize_4   1307 non-null   uint8  \n",
      " 24  FamilySize_5   1307 non-null   uint8  \n",
      " 25  TicketGroup_1  1307 non-null   uint8  \n",
      " 26  TicketGroup_2  1307 non-null   uint8  \n",
      " 27  TicketGroup_3  1307 non-null   uint8  \n",
      " 28  TicketGroup_4  1307 non-null   uint8  \n",
      " 29  TicketGroup_5  1307 non-null   uint8  \n",
      "dtypes: float64(3), int64(3), uint8(24)\n",
      "memory usage: 134.4 KB\n"
     ]
    }
   ],
   "source": [
    "data.info()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a64fb8e5",
   "metadata": {},
   "source": [
    "##### 缺失值处理\n",
    "\n",
    "（1）测试集Fare数据缺失值处理\n",
    "\n",
    "    测试集Fare缺失量为1，选择填充平均值处理"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 307,
   "id": "d35063fd",
   "metadata": {},
   "outputs": [],
   "source": [
    "data['Fare']=data['Fare'].fillna(data['Fare'].mean())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9a52d297",
   "metadata": {},
   "source": [
    "（2）Age-数值型数据缺失值处理\n",
    "\n",
    "    Age缺失值填充 把缺失值作为新的label，建立模型得到预测值，然后进行填充"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 308,
   "id": "cccdd505",
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.ensemble import RandomForestRegressor\n",
    "\n",
    "def set_missing_ages(df):\n",
    "    age_df = np.array(df.columns)\n",
    "    index=[0,1]\n",
    "    age_df=np.delete(age_df, index)\n",
    "    age_df=df[age_df]\n",
    "    # 乘客分成已知年龄和未知年龄两部分\n",
    "    known_age = age_df[age_df.Age.notnull()].values\n",
    "    unknown_age = age_df[age_df.Age.isnull()].values\n",
    "\n",
    "    # y即目标年龄，第一列为Age，作为新的label\n",
    "    y = known_age[:, 0]\n",
    "    # X即特征属性值\n",
    "    X = known_age[:, 1:]\n",
    "\n",
    "    # fit到RandomForestRegressor之中\n",
    "    rfr = RandomForestRegressor(random_state=0, n_estimators=100, n_jobs=-1)\n",
    "    rfr.fit(X, y)\n",
    "\n",
    "    # 用得到的模型进行未知年龄结果预测\n",
    "    predictedAges = rfr.predict(unknown_age[:, 1::])\n",
    "    df.loc[ (df.Age.isnull()), 'Age' ] = predictedAges\n",
    "\n",
    "    return df, rfr\n",
    "\n",
    "data, rfr = set_missing_ages(data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 311,
   "id": "1bcb6255",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>PassengerId</th>\n",
       "      <th>Survived</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Pclass_1</th>\n",
       "      <th>Pclass_2</th>\n",
       "      <th>Pclass_3</th>\n",
       "      <th>Sex_female</th>\n",
       "      <th>...</th>\n",
       "      <th>FamilySize_3</th>\n",
       "      <th>FamilySize_4</th>\n",
       "      <th>FamilySize_5</th>\n",
       "      <th>TicketGroup_1</th>\n",
       "      <th>TicketGroup_2</th>\n",
       "      <th>TicketGroup_3</th>\n",
       "      <th>TicketGroup_4</th>\n",
       "      <th>TicketGroup_5</th>\n",
       "      <th>Age_scaled</th>\n",
       "      <th>Fare_scaled</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>0.0</td>\n",
       "      <td>22.000000</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>7.2500</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>-0.576259</td>\n",
       "      <td>-0.502142</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>1.0</td>\n",
       "      <td>38.000000</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>71.2833</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.602143</td>\n",
       "      <td>0.735782</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>1.0</td>\n",
       "      <td>26.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>7.9250</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>-0.281659</td>\n",
       "      <td>-0.489092</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>1.0</td>\n",
       "      <td>35.000000</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>53.1000</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.381192</td>\n",
       "      <td>0.384254</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>0.0</td>\n",
       "      <td>35.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>8.0500</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.381192</td>\n",
       "      <td>-0.486676</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1304</th>\n",
       "      <td>1305</td>\n",
       "      <td>NaN</td>\n",
       "      <td>29.830457</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>8.0500</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.000455</td>\n",
       "      <td>-0.486676</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1305</th>\n",
       "      <td>1306</td>\n",
       "      <td>NaN</td>\n",
       "      <td>39.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>108.9000</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.675793</td>\n",
       "      <td>1.463007</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1306</th>\n",
       "      <td>1307</td>\n",
       "      <td>NaN</td>\n",
       "      <td>38.500000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>7.2500</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.638968</td>\n",
       "      <td>-0.502142</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1307</th>\n",
       "      <td>1308</td>\n",
       "      <td>NaN</td>\n",
       "      <td>29.830457</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>8.0500</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.000455</td>\n",
       "      <td>-0.486676</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1308</th>\n",
       "      <td>1309</td>\n",
       "      <td>NaN</td>\n",
       "      <td>5.890700</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>22.3583</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>-1.762711</td>\n",
       "      <td>-0.210060</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>1307 rows × 32 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "      PassengerId  Survived        Age  SibSp  Parch      Fare  Pclass_1  \\\n",
       "0               1       0.0  22.000000      1      0    7.2500         0   \n",
       "1               2       1.0  38.000000      1      0   71.2833         1   \n",
       "2               3       1.0  26.000000      0      0    7.9250         0   \n",
       "3               4       1.0  35.000000      1      0   53.1000         1   \n",
       "4               5       0.0  35.000000      0      0    8.0500         0   \n",
       "...           ...       ...        ...    ...    ...       ...       ...   \n",
       "1304         1305       NaN  29.830457      0      0    8.0500         0   \n",
       "1305         1306       NaN  39.000000      0      0  108.9000         1   \n",
       "1306         1307       NaN  38.500000      0      0    7.2500         0   \n",
       "1307         1308       NaN  29.830457      0      0    8.0500         0   \n",
       "1308         1309       NaN   5.890700      1      1   22.3583         0   \n",
       "\n",
       "      Pclass_2  Pclass_3  Sex_female  ...  FamilySize_3  FamilySize_4  \\\n",
       "0            0         1           0  ...             0             0   \n",
       "1            0         0           1  ...             0             0   \n",
       "2            0         1           1  ...             0             0   \n",
       "3            0         0           1  ...             0             0   \n",
       "4            0         1           0  ...             0             0   \n",
       "...        ...       ...         ...  ...           ...           ...   \n",
       "1304         0         1           0  ...             0             0   \n",
       "1305         0         0           1  ...             0             0   \n",
       "1306         0         1           0  ...             0             0   \n",
       "1307         0         1           0  ...             0             0   \n",
       "1308         0         1           0  ...             1             0   \n",
       "\n",
       "      FamilySize_5  TicketGroup_1  TicketGroup_2  TicketGroup_3  \\\n",
       "0                0              1              0              0   \n",
       "1                0              0              1              0   \n",
       "2                0              1              0              0   \n",
       "3                0              0              1              0   \n",
       "4                0              1              0              0   \n",
       "...            ...            ...            ...            ...   \n",
       "1304             0              1              0              0   \n",
       "1305             0              0              0              1   \n",
       "1306             0              1              0              0   \n",
       "1307             0              1              0              0   \n",
       "1308             0              0              0              1   \n",
       "\n",
       "      TicketGroup_4  TicketGroup_5  Age_scaled  Fare_scaled  \n",
       "0                 0              0   -0.576259    -0.502142  \n",
       "1                 0              0    0.602143     0.735782  \n",
       "2                 0              0   -0.281659    -0.489092  \n",
       "3                 0              0    0.381192     0.384254  \n",
       "4                 0              0    0.381192    -0.486676  \n",
       "...             ...            ...         ...          ...  \n",
       "1304              0              0    0.000455    -0.486676  \n",
       "1305              0              0    0.675793     1.463007  \n",
       "1306              0              0    0.638968    -0.502142  \n",
       "1307              0              0    0.000455    -0.486676  \n",
       "1308              0              0   -1.762711    -0.210060  \n",
       "\n",
       "[1307 rows x 32 columns]"
      ]
     },
     "execution_count": 311,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Age和Fare归一化\n",
    "import sklearn.preprocessing as preprocessing\n",
    "scaler = preprocessing.StandardScaler()\n",
    "age_scale_param = scaler.fit(data['Age'].values.reshape(-1,1))\n",
    "data['Age_scaled'] = scaler.fit_transform(data['Age'].values.reshape(-1,1), age_scale_param)\n",
    "fare_scale_param = scaler.fit(data['Fare'].values.reshape(-1,1))\n",
    "data['Fare_scaled'] = scaler.fit_transform(data['Fare'].values.reshape(-1,1), fare_scale_param)\n",
    "del data['PassengerId']\n",
    "data"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0d7752b7",
   "metadata": {},
   "source": [
    "### 模型训练及参数优化"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 340,
   "id": "bac467f3",
   "metadata": {},
   "outputs": [],
   "source": [
    "# 数据整理,划分数据集\n",
    "from sklearn.model_selection import cross_val_score, train_test_split\n",
    "\n",
    "train=data.loc[0:890,:]\n",
    "test=data.loc[891:,:]\n",
    "\n",
    "y= train.pop('Survived')\n",
    "X=train\n",
    "\n",
    "X_train, X_val, y_train, y_val = train_test_split(X, y, random_state=0, test_size=0.3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 343,
   "id": "813b021d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.7865168539325843"
      ]
     },
     "execution_count": 343,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from sklearn.linear_model import LogisticRegression\n",
    "import warnings\n",
    "warnings.filterwarnings('ignore')\n",
    "\n",
    "model = LogisticRegression()\n",
    "model.fit(X_train , y_train )\n",
    "model.score(X_val , y_val )"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8131142b",
   "metadata": {},
   "source": [
    "### 总结\n",
    "    （1）特征工程处理，算法模型参数选择对预测结果有很大影响。\n",
    "\n",
    "    （2）特征可以再细化分析，提取更多的特征作为预测模型的数据集使用。\n",
    "\n",
    "    （3）算法虽然是平时学习的重点，但是大多数的算法都可以调包直接实用，重点在于算法背后的逻辑推导。"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
